InnoDB Table Bloat, Performance, and New Row Formats in 5.x
Over the weekend we migrated one of our tables from MyISAM to InnoDB. Generally speaking we’re pretty happy with most of our InnoDB migrations.
InnoDB generally uses 2-3x more data on disk but the write ahead log (WAL) buys us so much more added performance that it’s generally worth the switch.
This migration however was a bit more problematic. Importing the data from mysqldump as MyISAM into the new box only took about 2 hours. Converting the data to InnoDB has so far taken 24 hours and using 21G vs the original 6.1G for MyISAM.
I should also note that it’s not complete yet and it’s 3.5x larger than our original MyISAM install.
Clearly the performance will be worse in our situation for InnoDB because I don’t think the buffer pool efficiency will make the write ahead logging fast enough to beat MyISAM on the same hardware.
We’re not the only people who have noticed this problem:
* Database size – when we switched from MyISAM tables to InnoDB tables in MySQL, the size of our database grew from ~1GB to 10+GB! When we made the switch this weekend, the MySQL InnoDB database was using 34 GB, and the same data in a PostgreSQL database is only 9.6 GB – this should keep our hardware costs down a bit.
* Load time – The current MySQL setup takes over a day to restore the current database, the load of the data into the PostgreSQL database took just over 4 hours.
This will hopefully become less of an issue in the future with the ‘compact’ row format introduced in MySQL 5.0:
The physical record structure for InnoDB tables is dependent on the MySQL version and the optional ROW_FORMAT option used when the table was created. For InnoDB tables in MySQL earlier than 5.0.3, only the REDUNDANT row format was available. For MySQL 5.0.3 and later, the default is to use the COMPACT row format, but you can use the REDUNDANT format to retain compatibility with older versions of InnoDB tables.
In my experience the ‘compact’ row format was actually LARGER in many common situations when compared to ‘redundant’. This might be just because I compared it to a column which wouldn’t benefit from storing variable rows.
It seems the major benefit of the compact row format is storing NULL values with one bit for whether it’s null:
The record header contains a bit vector for indicating NULL columns. The bit vector occupies (n_nullable+7)/8 bytes. Columns that are NULL will not occupy other space than the bit in this vector.
… and the ability to store variable length fields more efficiently:
For each non-NULL variable-length field, the record header contains the length of the column in one or two bytes. Two bytes will only be needed if part of the column is stored externally or the maximum length exceeds 255 bytes and the actual length exceeds 127 bytes.
I wish they’d just come out and say this in the documentation.
The other issue is whether InnoDB stores fixed width records in the buffer pool with padding or as variable width. If it’s storing the data exactly as it’s represented in the redundant row format then this will be a BIG waste of memory.
I just did a comparison of some of our data sets. With our variable width tables InnoDB with the compact row format can save us 15% …