Yahoo + Hadoop on 10k Cores
Yahoo blogged today about their use of Hadoop on 10k cores:
This process is not new (see the AltaVista connectivity server). What is new is the use of Hadoop. Hadoop has allowed us to run the identical processing we ran pre-Hadoop on the same cluster in 66% of the time our previous system took. It does that while simplifying administration. Further we believe that as we continue to scale up Hadoop, we will be able to scale up our production jobs as needed to larger cluster sizes.
What’s interesting here is that Hadoop has some clear inefficiencies. I’m sure they could see additional performance gains if Hadoop were tuned on a per node basis.