Log Structured InnoDB

For a while now I’ve been thinking that the way InnoDB handles transactions for all in-memory databases is flat out broken.

The vast majority of web applications using InnoDB are running the entire database out of memory.

With a 8-32GB database it only takes 1-3 minutes to write the whole image to disk.

If InnoDB were smart it would just enable a log-structured mode where it would continually read data from memory and write it to disk. The write ahead log would work but there would be no fuzzy checkpointing. In essence there would just be ONE continual checkpoint of the whole database.

When the database crashes you just read the database from the last checkpoint and replay log entries from the write ahead log. Basically a normal recovery.

  1. Man, I wish I could buy machines with enough RAM to do that. :(

    We buy machines with as much as we can possibly get without going to 4GB DIMMs (yet), but that’s no-where near enough – and comparing notes with other web apps, we’re note alone.

    Sounds like a neat idea for those that can manage it, though. You tell MySQL yet?

  2. Hey Don.

    We’re still using higher density disks for 1/2 of our cluster. The other cluster 1/2 is all in-memory though. Mostly metadata like queues, link graphs, weblog indexes, etc.

    Full content like HTML, and RSS is kept in an append only structure which is about 1T per box on InnoDB with about an 8G cache.

    I just want the ability to run InnoDB in both modes (traditional and log structured).

    I’ve found when comparing my notes that a good % of people are running 100% out of memory.

    I’m going to try to ping MySQL about this idea at the conf.

    I posted a comment about this on which Heikki commented on the MySQL Performance Blog but I don’t think I made a solid argument at the time.

    I think the analogy to log structured filesystems might be the win here.

    Seems like a straight forward implementation idea too.

    My back of the envelope calculations should yield a 5x performance boost and numbers comparable with Bigtable, hbase, hypertable.


  3. Hi Kevin,

    In http://www.scribd.com/doc/3919954/Scaling-MySQL-and-Java-in-High-Write-Throughput-Environments-Presentation

    (which I assume you wrote), you mention very early “InnoDB with write ahead”.
    I haven’t read the entire document yet, but I did look up the web and read quite a lot of docs about InnoDB performance, but I could not find much information about write-ahead in InnoDB.

    care to shed some light?

%d bloggers like this: