Desirable InnoDB Features

These are a few features I want to see implemented in InnoDB this year.

We need them for production so we’re probably going to throw down some cash to see this happen. The patches will have to be Open Source of course.

If there are any companies out there that would like to co-sponsor these features feel free to add a note in the comments or contact me via email.

Native InnoDB warmup

We have about 100GB of InnoDB which needs to run 100% out of memory. Our per-DB images are 32GB and the databases on these box fixes into about 25GB (a bit of room for growth).

Right now we have a hacked InnoDB warmup script that runs a number of SELECTS into a blackhole table which warmup the DB.

This MUST be amazingly inefficient as InnoDB could just scan the DB moving pages into memory as it reads forward. One sequential scan of the table would be a LOT faster than our current random read. Right now it takes us about 1.5 hours to warmup a 25GB database image.

This is slightly related to a persistent and restored LRU buffer pool which Jeremy Cole suggested but different enough to suggest a separate architecture.

Faster recovery

Right now InnoDB recovery is DOG slow and 100% CPU bound. Apparently, this us due to a poorly designed algorithm that does a full scan of a linked list on recovery.

I need to see this fixed because when a DB crashes I need to get it up and online ASAP.

This is also needed to improve our usage of InnoDB hotcopy for performing slave clones. We take a snapshot of a running InnoDB database and perform recovery on a slave with the raw binary image. Works great other than the fact that recovery takes ages.

Improved Operation for 100% Memory Databases

If we improve the performance of recovery, there exists some very amazing potential for ULTRA fast performance when running entirely out of memory.

One would use LARGE write ahead logs (20-50GB). Enough to hold a few hours of transactions.

Then we could enable a mode for InnoDB to disable fuzzy checkpointing and instead continually write the full DB image sequentially to disk. The DB would be continually doing a full checkpoint and deleting the previous database image. If a box crashed the DB would just be restored from the write ahead logs.

The write transaction throughput in this situation would be seriously improved. In theory this would be about 1/2 the full sequential throughput of the drive. Fifty percent of the write performance would go to the WAL and the other 50% would be to the data files.

This would make InnoDB semi-log structured.

Another alternative is to just continually flush starting from the beginning of the buffer pool and on to the end. Then updating the WAL checkpoint so that the oldest block seen by the flush thread is cleaned up.

You would need to be 100% certain that your logs were sized large enough to store plenty of transactions so that it doesn’t force a checkpoint.

It could also completely eliminate InnoDB non-contiguous data and disk fragmentation.

Of course this might be a completely insane idea and perhaps I should just be using a log structure storage engine to begin with.

  1. Kevin,

    As you know Percona can do these for you and make them available to the public. Well you know it. This is just for whoever comes to your blog post and thing well who can I pay to implement these :)

%d bloggers like this: