Default MySQL Slave Network Timeouts Considered Harmful

Remember the other day when I blogged about MySQL being broken with binary replication?

I was wrong. It might actually be functional (I still haven’t tested) but the problem was more difficult to diagnose than I originally thought.

Here’s what was happening.

The default slave_net_timeout value is 3600 seconds. The network was being congested due to network activity and other variables. This would cause the MySQL slave to block on reads to the master. As far as it was concerned it was zero seconds behind.

A temporary fix is to set slave_net_timeout to a more realistic value (5 seconds).

Which yields a few more bugs in MySQL that should be fixed.

The default value of slave_net_timeout should NOT be 3600 seconds. This is insane. Let’s select a more realistic value please.

Seconds_Behind_Master should include the last master read time. If it’s timing out then Seconds_Behind_Master should include this value.

Tip of the hat to Barry @ WordPress for connecting the dots for me.

  1. This is why our replication check scripts over at FastMail.FM don’t use those values for anything at all, instead:

    $ echo “desc ping.PingData” | sql
    Field Type Null Key Default Extra
    ServerId int(11) NO PRI
    ExternalTime int(11) YES NULL
    InternalTime timestamp NO CURRENT_TIMESTAMP

    Each server runs a job every ‘N’ seconds and INSERT OR UPDATEs their row with the latest system clock time (we use Perl, but anything that can get a unix timestamp is fine) as well as the CURRENT_TIMESTAMP internally.

    It then checks that the other one is up-to-date enough as well. We get emailed if it ever gets over 30 seconds, and paged if it gets over 5 minutes.

    This is master-master replication (but with one end getting all the updates in the general case) with an offsite replica also watching these (not updating the ping table obviously) and shutting down its local copy of postfix (backup MX) if it gets more than 20 minutes behind by the ping table values.

%d bloggers like this: