Bigtable and C

There’s been a lot of activity in the distributed database space in the last few weeks.

First was KFS (Kosmos FS) and now Powerset brings us Hadoop.

I’ve been thinking about this a lot recently but I think Java is the wrong language in which to design distributed databases (or any database in general).

I’m specifically talking about the on-disk persistence engine.

The main problem being implementations of sendfile, async and event IO, memory management, and implementation details such as access to mlock.

Java’s VM is one problematic area. Once the VM allocates memory it doesn’t want to let it go. Then there’s the problem that there’s no implementation of mlockall for Java. One could write an implementation in JNI but then you run into other problems with lack of access to other JNI libraries.

C just isn’t that hard. For a small and tight database implementation like GFS or Bigtable it seems to just make more sense to implement it in C.

Memcached and lighttpd are a great examples of what I’m talking about. They’re small, thin, and get the job done.

%d bloggers like this: