Paxos, Politics, Democracy, and Distrust of Authority.

Designing a scalable system means nothing until you put it into production. There are a number of real world problems that come into play when you have a system live and running and under load for months which you’d never anticipate otherwise.

Today we hit one. There’s a bug in MySQL where it reports healthy replication status when in fact it’s hours behind. It keeps rotating from zero to 6000 seconds behind the master.

The problem was complicated by the fact that our clients (robots in this case) were trusting a central authority about the health of the system.

Bad idea. Democracy is a good thing. In a perfect world citizens would be able to detect corruption and start a revolution to remove the current authority from power.

Enter Paxos. We’re planning on adding Paxos support via lbpool to our distributed database we’ve designed (think real-world Bigtable) to solve just this problem.

Instead of being naive and trusting that our servers were reporting correct values they could analyze the results from multiple sources and detect skew. The clients could initiate a revolution based on a simple quorom and remove the existing corrupt entity.

In this case they would have detected that this MySQL server was confused and took it out of production.

Now if only the US Americans could Paxos for consensus and impeach the Bush administration! Talk about corruption!

%d bloggers like this: