Great read on Hadoop’s 4000 machine scalability wall.
Most of us aren’t running into this wall but it’s interesting to see what it’s happening.
The elasticity of clouds make this more of a challenge. Most people don’t normally need 4000 noes but say you want to do something REALLY fast. I could see wanting to spool 8000 -10000 nodes to compute PageRank or a clustering algorithm and wanting it to finish FAST.
Having a limit on 4k nodes is a challenge.
Given observed trends in cluster sizes and workloads, the MapReduce JobTracker needs a drastic overhaul to address several deficiencies in its scalability, memory consumption, threading-model, reliability and performance. Over the last 5 years, we’ve done spot fixes, however lately these have come at an ever-growing cost as evinced by the increasing difficulty of making changes to the framework. The architectural deficiencies, and corrective measures, are both old and well understood – even as far back as late 200