RAID and Scaling Out vs Scaling Up

Brian Moon response to my death of RAID post:

“At this point, I think this is a philosophy argument and not a real world
application argument at this point.”

It’s both. It’s the philosophy and REAL WORLD argument that Google uses. We use it to and I know of a lot of large companies (MySpace, Livejournal, Facebook, etc) that firmly believe in it.

“For real people, having a server go down is pain in the ass. Why should I want
to spend a full day of labor rebuilding a server because a $200 part broke or
just got corrupted. It takes 10 minutes to start a rebuild and maybe another 10
minutes to install a new drive if the rebuild fails.”

The machine is going to fail anyway. Disk isn’t the only reason a machine
fails. It can fail due to a power supply or bad memory. Either way someday
you’re GOING to have to fix it.

” … would you rather admin 30 servers or 1000? I think 30.”

This is a false dichotomy. You can’t replace 1000 machines with 30 and scale
the pricing. Scaling up doesn’t work. Only scaling out works. If you were to
buy 30 machines that can replace 1k machines the pricing would be 30x.

“I should add that we only use RAID on servers that are used for data storage.
Losing data sucks. For web servers we don’t use RAID. They do fit the model
that Kevin describes. We have a lot of them. If one goes down, its ok. “

Yes. Your application framework supports this use case. My point is that as
database clustering technology improves it’s easier to go with commodity
hardware without RAID.

  1. Not sure you can really cite LiveJournal there. LiveJournal uses single non-RAID disks for some applications (MogileFS) but all of its database servers are hardware RAID.

  2. I think my point being that LiveJournal being a scale out shop …. not a scale up shop…. sorry for the lack of clarification.

%d bloggers like this: