Yahoo + Hadoop on 10k Cores

Yahoo blogged today about their use of Hadoop on 10k cores:

This process is not new (see the AltaVista connectivity server). What is new is the use of Hadoop. Hadoop has allowed us to run the identical processing we ran pre-Hadoop on the same cluster in 66% of the time our previous system took. It does that while simplifying administration. Further we believe that as we continue to scale up Hadoop, we will be able to scale up our production jobs as needed to larger cluster sizes.

What’s interesting here is that Hadoop has some clear inefficiencies. I’m sure they could see additional performance gains if Hadoop were tuned on a per node basis.

  1. chadwa

    Make it work right, then make it work fast.

    When you need a truly scalable and fault-tolerant distributed system, slow and steady wins the race. Once you have it working right and at provable scale, then you can increase the performance.

    The system that they replaced was faster on a per-operation basis than Hadoop, but it wasn’t fault tolerant enough which meant that, at web-scale, failures inevitably caused the job to take much longer.

  2. Hey Chad.

    I was going to blog a follow up post.

    It’s interesting. On the high end (with a high number of nodes) it’s better to have them slightly less efficient on a per node basis as long as you have fault tolerance and somewhat linear scalability.

    On the low to medium range spectrum the opposite is almost true.

    You don’t really need the system to be as fault tolerant because you don’t have as many nodes and failures won’t happen often. Since you want to maximize the amount of cash you’re spending you want per node efficiency to be very high.

    Yahoo in this case is clearly at the high end.

    Spinn3r is in the low-mid range. We only have 80 cores and about 1-5TB of storage.

    I’d rather have per node performance optimizations because I can save a lot of money on hosting.

    Hopefully this will be resolved though and you’ll have the best of both worlds.

  3. chadwa

    I think you want to minimize the amount of cash you are spending. ;)

    You definitely right to point out that with fewer nodes you are less likely to have faults on individual jobs and so the fault tolerance may be less critical.

    It also depends a lot on the cost of a fault, of course.

    My experience tells me that it is much much easier to improve the performance of a slow system that is highly fault-tolerant than it is to make a performant system that isn’t fault-tolerant truly reliable. Baking fault-tolerance into the bones of the system from the start is the way to go — not retrofitting it later.

    So if I can afford the higher costs in the short term, I’d rather hitch my wagon to a trusty tortoise than to an erratic hare — in the long term, I’d likely have to ditch the hare anyway and the switching costs can be very high. Plus the tortoise is likely to pick up steam over time — hopefully enough to outpace my scaling needs.

    Just my $.02 of course.

  4. Hey Chad.

    Just to clarify, I *am* trying to save money :)

    It’s a pretty significant amount though.. the SSDs we’re looking at are about 10-15x cost savings. InnoDB and misc performance tuning options can result in radical cost savings as well.

    I’d rather be able to hire 1-2 additional engineers and have less hardware :)

    That said, I still suspect we’ll have quite a bit as we expand our crawler resources to cover additional content.

    I hear you about fundamental architecture issues. The federated MySQL install we’re working on should be able to scale to a few hundred nodes. It’s not designed to scale to 10k nodes but that’s fine in our situation.

    I’ve been thinking about writing a blog post about this topic.

    It’s turning out that I tend to want to write the whole stack or at least touch it and tune significant parts of it at some point.

    I’ve written all the crawler components to our system over the years. I’d say I’ve written (or managed engineers) working on 90% of our crawler stack.

    The database is just very important to what we do … I suspect that I’ll have to write something in C at some point ….

    I’d rather NOT do it but everything I look into these tools they don’t do what I want them to do…. this includes MySQL and Linux which I seem to have problems with on an almost daily basis.

    Linux is a good example. The VM and paging algorithms are totally broken :-/


%d bloggers like this: