Posts Tagged ‘bigtable’

This is pretty nice. Google released Zippy as Open Source:

Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as “Zippy” in some presentations and the likes.)

This means that along with open-vcdiff it is possible to use the full Google compression tool chain.

Apparently, Digg performed a big migration from MySQL to Cassandra and a big migration to their new Digg v4 architecture and now their VP of Engineering has been shown the door:

Ever since Digg launched its new site design, it’s been plagued with all kinds of trouble, not least of which is that it keeps going down. The problems with the new architecture are so bad that VP of Engineering John Quinn is now gone, we’ve confirmed with sources close to Digg.

In a Diggnation video today, CEO Kevin Rose explained some of the technical issues the site is dealing with and why it can’t simply roll back to the previous architecture. The new version of Digg, v4, is based on a distributed database called Cassandra, which replaced the MySQL database the site ran on before. Cassandra is very advanced—it is supposed to be faster and scale better—but perhaps it is still too experimental. Or maybe it’s just the way Digg implemented it (Twitter uses Cassandra, although not for its main data store, as does Facebook in places, but it obviously is not as battle-tested as it needs to be). Every engineer at Digg is currently just trying to keep the site up and running.

Some of this is political. Perhaps Mr. Quinn was excused for other reasons above and beyond this switch.

Perhaps he should have had buy in from other members of the team. Had Rose personally signed off on this migration it would have been tough to fire their VP of Engineering.

The technical aspects on this type of migration are VERY difficult. Not just because you’re moving from one DB to another but a lot of the polish, fit, and finish of your existing system tend to be taken for granted over time.

Newer databases don’t have this type of polish and you end up having to duplicate a lot of infrastructure that’s already present on the previous generation.

MySQL is definitely no panacea. You’re going to have pain either way. At least with some of the modern DBs you’re partially headed in the right direction.

One trend I’ve seen is for people to use the LAMP stack to serve websites but then to use Hadoop + Hive as part of their ETL setup so they can run reports and transform production data.

There is no solid bigtable implementation just yet. I wish there was but it doesn’t seem like we have one just yet.

Cassandra isn’t that bad of course. Reddit, Digg’s main competitor – is running Cassandra.

Seems like a strange thing to fire someone over. If you’re main competitor is running the same database the decision to switch certainly couldn’t have been too bad.