Fun with Opterons, SATA, and INNODB

We’ve been doing a lot of performance analysis of MySQL and SATA disks over the last few days and I wanted to get some thoughts out in the open to see what you guys thought.

Now that Debian Etch is out we’re seriously looking at making the jump to a full 64bit OS with 8G of memory on each of our boxes.

This also involves benchmarking SATA and potentially migrating away from MyISAM which, while very trusty, is seriously showing its age for us.

First off is raw IO tuning.

XFS benchmarks show about a 20% performance boost from bonnie++ over ext3. This is pretty obvious. Tuning readahead with ‘blockdev’ didn’t seem to help performance much. SATA seems pretty tuned when compared to hdparm and IDE drives.

After fully tuning it seems we can get about 90MBps out or our disks. Not to shabby. The Linux md driver on RAID 0 didn’t seem to boost IO performance much (and I’m pretty disappointed). Even with a 1M chunk size (we also tested 64k) we were only seeing about 110MBps. Each drive can (in parallel) handle about 85MBps concurrently so it’s not an issue with the controller. It might be an issue where the RAID really starts to show performance in concurrent reads.

We might just end up using MySQL 5.1 with partitioning on top of INNODB which should allow us to utilize both disks.

XFS still scares me though. Apparently (according to Andrew Morton) there’s a long standing bug with XFS where it writes the metadata log first and then the data second. This can lead to a race condition where the box crashes before the data is written and has an inconsistent metadata log. Your file will then become corrupt with a series of 0000s in the middle. Ouch.

One issue which will become hard to deal with is the fact that Etch out of the box doesn’t support NCQ. I couldn’t get the drives to enable NCQ on bootup and apparently this is a known issue with libata. I’m sure this problem will settle down in the next two months as Etch matures a bit. Hopefully I won’t have to patch the kernel.


We’re about 50-70% write bound so having fast writes is very important for us. INNDBO seems to be the clear winner here. With the support for a write ahead log as well as other operations it basically destroyed MyISAM.

We even cheated and created a 1G RAM disk and created a MyISAM table there. MyISAM was still about 30% slower than INNODB since it ended up pending 100% of the CPU (userspace) building the table.

We also benchmarked Postgres vs INNODB and while the results are very inconclusive it’s looking like INNODB will beat Postgres here as well (especially for our workload).

More thoughts as they come. I can’t wait to deploy this stuff.

  1. I found Linux’s md raid performance to be poor at high write levels. I’ve seen hardware cards have similar issues if the RAID is not properly initialized.

    In the end, we opted for no RAID at all and use replication for redundancy. Not as nice as RAID but it was far easier to split the busy tables, innodb and other data onto multiple Raptor drives than it was to sort out the internals of the md drivers.

    This was on RHEL 4 64Bit. I’ve also seen cases where MyISAM is faster than InnoDB. It really just depends on the queries.

  2. Jonathan Moore

    To be fair we we have not really finish the testing of postgres. I think that so far, innodb is winning is by use of bulk inserts where a single insert command can issue many inserts all at once. For postgres I have broken the data in to tranastions of 1000 inserts. The most I could get out of postgres was about 3MB/sec even withe wall on it’s own disk. Even thou the transactions on the disk were low, around 100/sec. The CPU use was heavy so this may have been the issue. I want to try innodb with the tranastions and single inserts just to see if it is the insert method that gives innodb the advantage.

    Also of note is that all the tests were done with a single process inserting. I have not tried innodb with multipull writers but I ran a test with postgres with ten writers. Postgres did really well dropping only 25% preformance wise with the parrell load.

  3. Hi!

    Can you give me a hint about what the table would look like, and the partitioning scheme you were thinking of deploying? I am working on benchmarks at the moment and I would be curious to see what sort of scheme you are looking at.


  4. Anon

    Is there some secret to InnoDB performance? We were testing a while back with MySQL 4.1 importing from sql dump backups. InnoDB was several times slower than MyISAM.

    When we restored the InnoDB image files, they broke the server completely. Without an InnoDB expert in-house we were forced into the only practical step – switch back to MyISAM.

  5. Hey Jonathan…… thanks for the clarification on the Postgres stuff. As I was saying I’m pretty interested in seeing a fair comparison.

    The only INNODB tuning we did was to use a HUGE write ahead log file so that checkpoints would only happen every 10-15 seconds or so. We’re also writing LARGE transactions.

  6. Brian.

    We’re not looking at partitioning right now. We’re still on MySQL 4.1 as it’s REALLY rock solid.

    That said it looks pretty interesting especially we can’t get decent RAID 0 performance out of the linux md driver.

    We’re probably just going to put the innodb log on one disk and the data directory on another disk.


  7. @Jeffrey What sort of queries were faster with MyISAM? Are you agreeing with Kevin that queries are faster with InnoDB on the writes?

  8. The XFS “bug” isn’t really a bug in XFS… it only affects applications that don’t do IO properly and safely.

    So for InnoDB (which, of course, when expanding tablespaces) should do the right thing, there is nothing to worry about.

    There have also been recent patches going in to XFS to close the gap where this could happen (to help buggy user space applications) and a really recent one to (IIRC) remove it all together.

%d bloggers like this: