Thoughts on Innodb Internals (RE Heikki Tuur)

Heikki Tuuri has responded to the Innodb questions the community asked him earlier this month.

There’s a lot of information here so instead of just responding in comment form a dozen times I decided to make this just one coherent post.

Q7: Does Innodb has any protection from pages being overwritten in buffer pool by large full table scan

HT: No

PZ: Another possible area of optimization. I frequently see batch jobs killing server performance overtaking buffer pool. Though full table scan is only one of replacement policy optimizations possible.

Note that most database systems like MyISAM are very vulnerable to this problem. One solution is to have dedicated reporting machines or to compute stats in some other manner.

This is one advantage of having the cache in user space and not having to rely on the kernel as you can give it ‘hints’ about how to perform.

Q15: How frequently does Innodb fuzzy checkpointing is activated

HT: InnoDB flushes about 128 dirty pages per flush. That means that under a heavy write load, a new flush and a checkpoint happens more than once per second.

Why not make it exactly 128 dirty pages per flush and ditch the “fuzzy” part?

I’d like to boost his radically so that I can sustain higher IO on database that are 100% in memory.

Ideally I’d be able to just do sequential writes to the disk.

This is going to become more of an issue as disk subsystems get faster and faster. I imagine for RAID systems with lots of this that this is a BIG bottleneck.

… Heikki also goes on to talk about BLOB storage:

The ‘zip’ source code tree by Marko has removed most of the 768 byte local storage in the record. In that source code tree, InnoDB only needs to store locally a prefix of an indexed column.

PZ: I think it is also very interesting question what happens for blobs larger than 16K – is exact size allocated or also segment based allocation is used.

I was curious about this zip source code tree so I went to dig into this a bit more and found his talk from MySQL ComCon Europe, Frankfurt Nov 10th, 2004

We will also implement a transparent, on-the-fly zip-like compression that will reduce
space usage a further 50 %. This will appear in MySQL-5.1

Interesting. Was this implemented? Did it ever make it into 5.1?

I then asked:

Any plans to enable tuning of the checkpointing rate? Postgres exposes this data and allows the user to tune the checkpointing values.

HT: Hmm… we could tune the way InnoDB does the buffer pool flush. I think Yasufumi Kinoshita talked at Users’ Conference 2007 about his patch that makes InnoDB’s flushes smoother and increase performance substantially.

I assume there is lots of room to tune the flushes, since I never optimized the algorithm under a realistic workload.

Making the doublewrite buffer bigger than 128 pages would require a bit more work. Now it is allocated permanently in the system tablespace when an InnoDB instance is created.

In theory, one could just recompile with a doublewrite buffer (or disable it) and increase the fuzzy checkpointing rate more than 128 which should improve performance.

Is there a URL for Yasufumi Kinoshita’s patches?


  1. [quote]
    Q7: Does Innodb has any protection from pages being overwritten in buffer pool by large full table scan

    HT: No

    KB: Note that most database systems like MyISAM are very vulnerable to this problem.
    [/quote]

    I don’t quite agree. MyISAM doesn’t have any record buffer at all (it relies on the OS for that) so data records of course won’t be flushed out. For full index scans, the index buffer (key_buffer) will be swapped out though. Not sure if the index buffer is swapped out on full table scans though…

    Cheers,

    Jay






%d bloggers like this: