InnoDB Table Bloat, Performance, and New Row Formats in 5.x

Over the weekend we migrated one of our tables from MyISAM to InnoDB. Generally speaking we’re pretty happy with most of our InnoDB migrations.

InnoDB generally uses 2-3x more data on disk but the write ahead log (WAL) buys us so much more added performance that it’s generally worth the switch.

This migration however was a bit more problematic. Importing the data from mysqldump as MyISAM into the new box only took about 2 hours. Converting the data to InnoDB has so far taken 24 hours and using 21G vs the original 6.1G for MyISAM.

I should also note that it’s not complete yet and it’s 3.5x larger than our original MyISAM install.

Clearly the performance will be worse in our situation for InnoDB because I don’t think the buffer pool efficiency will make the write ahead logging fast enough to beat MyISAM on the same hardware.

We’re not the only people who have noticed this problem:

* Database size – when we switched from MyISAM tables to InnoDB tables in MySQL, the size of our database grew from ~1GB to 10+GB! When we made the switch this weekend, the MySQL InnoDB database was using 34 GB, and the same data in a PostgreSQL database is only 9.6 GB – this should keep our hardware costs down a bit.

* Load time – The current MySQL setup takes over a day to restore the current database, the load of the data into the PostgreSQL database took just over 4 hours.

This will hopefully become less of an issue in the future with the ‘compact’ row format introduced in MySQL 5.0:

The physical record structure for InnoDB tables is dependent on the MySQL version and the optional ROW_FORMAT option used when the table was created. For InnoDB tables in MySQL earlier than 5.0.3, only the REDUNDANT row format was available. For MySQL 5.0.3 and later, the default is to use the COMPACT row format, but you can use the REDUNDANT format to retain compatibility with older versions of InnoDB tables.

In my experience the ‘compact’ row format was actually LARGER in many common situations when compared to ‘redundant’. This might be just because I compared it to a column which wouldn’t benefit from storing variable rows.

It seems the major benefit of the compact row format is storing NULL values with one bit for whether it’s null:

The record header contains a bit vector for indicating NULL columns. The bit vector occupies (n_nullable+7)/8 bytes. Columns that are NULL will not occupy other space than the bit in this vector.

… and the ability to store variable length fields more efficiently:

For each non-NULL variable-length field, the record header contains the length of the column in one or two bytes. Two bytes will only be needed if part of the column is stored externally or the maximum length exceeds 255 bytes and the actual length exceeds 127 bytes.

I wish they’d just come out and say this in the documentation.

The other issue is whether InnoDB stores fixed width records in the buffer pool with padding or as variable width. If it’s storing the data exactly as it’s represented in the redundant row format then this will be a BIG waste of memory.

Update:

I just did a comparison of some of our data sets. With our variable width tables InnoDB with the compact row format can save us 15% …


  1. Were these issues part of the consideration when you compared MySQL and PostgreSQL?

  2. Partha Dutta

    Kevin,

    Very curious about your setup. We have much much larger databases (180+ gb) and can doa full reload in about 15 hours. Innodb performance in loading data varies. One thing I can tell you is never try to alter a large table from MyISAM to InnoDB. You are better off loading the data from flat files. If yiu want, I can review your setup to help you.

  3. Yeah….. That’s exactly what I did…. I tried to ALTER from MyISAM to InnoDB. I bet this is a transaction so it’s writing to the WAL….?

    I wonder how long it would take to create the tables FIRST and then bulk load from a mysqldump…..

  4. …… oh.. and we’re sitting at like 35 hours now….. :-/

  5. Dmitri Mikhailov

    Check if you have unique indexes on varchar columns, make sure table indexes fit entirely to memory.

  6. We actually DO have unique indexes on varchar columns…. I know that it’s about 3x slower for inserts (I hope that’s fixed) but it STILL shouldn’t be this slow.

    Kevin

  7. Also…. this varchar issue is a known bug …. I wonder if we should try to get it fixed…. hm.

  8. InnoDB tables will use 3.5X more space than MyISAM when either the MyISAM indexes were packed, the MyISAM tables were compressed or the InnoDB tables have large primary keys along with secondary indexes — because the primary key is repeated as the ‘row identifier’ in each secondary key index.

    The example you mention above about Postgres had the same problem and this was documented on that blog.

    This isn’t a secret. Any table that is index organized has this behavior.

    If you want to load data into InnoDB quickly:
    * use as large an InnoDB buffer cache as possible
    * make the InnoDB log files as large as possible
    * minimize the number of unique indexes on your tables
    * disable all calls to fsync from InnoDB. You have to hack the code to get this, or look at the Google patch. Of course, you only want to run in this mode when loading the table.






%d bloggers like this: