Precaching MySQL Replicated Data

This is a hack I’ve heard about a couple times now:

Paul wrote a script that reads from the logfile the queries that are going to be executed moments later. He parses the queries and constructs new select queries that populate the cache with the data that speeds up the upcoming writes. He claims, if I remember correctly, a three to four times speed-increase.

Here’s the problem in a nutshell. The master can write transactions in parallel but slaves can only write them in series. [1]

This means you have a lot of optimizations on the master (TCQ and NCQ being examples) that aren’t possible on the slave.

What this patch would do is precache the data so it’s already available in memory. Since you’re pre-reading the binary log you can run SELECTs in parallel on the SLAVEs so that the cache is hot when you’re ready to execute the queries.

I think this would work REALLY well if your data to memory ration was low but if it increases I think performance would fall.

That said, it wouldn’t even be necessary if your data was in memory since warming a cache would just be redundant.

Either way, it’s cool to see more people thinking out of the box

1. I still think it’s possible to run serial replication with MyISAM tables without concurrent insert but this would not be possible for other storage engines.

%d bloggers like this: