Kosmos – Google Filesystem Clone in C++

Today’s Open Source hero award goes to Kosmix for OSSing their GFS implementation:

Applications that process large volumes of data (such as, search engines, grid computing applications, data mining applications, etc.) require a backend infrastructure for storing data. Such infrastructure is required to support applications whose workload could be characterized as:

  • Primarily write-once/read-many workloads
  • Few millions of large files, where each file is on the order of a few tens of MB to a few tens of GB in size
  • Mostly sequential access
  • Ethan and Rich also have thoughts on the subject.

    I designed a similar GFS style distributed filesystem while at Rojo. It lacked a metadata server as it was designed to return pointers which you’d then store in your application (hence making the metadata server obsolete).

    By the time I left and moved on to start Tailrank it was storing .5T.

    One of the interesting properties of the current work in distributed filesystems is that they work REALLY well for large chunks of content. They’re not perfectly ideal for smaller metadata structures like link graphs and so forth.

    Bigtable builds on top of GFS of course but their random read performance due to GFS chunk size is a bit to be desired.

    It’s super awesome to have more choices in this space!

    %d bloggers like this: