MySQL Cluster (NDB) or Vertical Replication

There are essentially two main choices for scaling MySQL – cluster (NDB) or replication.

For many people replication works fine because their application is mostly read based. Just throw a few MySQL slave servers into the mix and you can scale out pretty well.

My guess is that this only works well for about 80-90% of users.

You update your database on the master but perform queries on the slaves. If you need more queries you can just add more slaves.

The problem with replication is that you can’t scale your writes. If you buy an expensive RAID array you can probably get 1500 transactions per second (maybe more) out of your IO array but that’s the best you can do for the whole cluster.

As soon as you hit 100% of your transactions as writes you’re done. You’ve hit a scaling wall with replication and you can’t go any farther.

You can of course go with vertical replication partitioning which works well for a lot of people.

Basically this involved putting a few tables (probably role based) in to a dedicated subcluster of machines. If you’re doing say 2000 transactions total but each IO subsystem can only handle 1500 transactions per second you can create two subclusters each handling 1/2 of the work with plenty of room for growth.

The problem with this is that you can’t really use both sides of the cluster at the same time. It’s essentially two sides of the brain without a corpus callosum.

Essentially it’s a hack. The ideal solution would be to have one uniform cluster with IO balanced/partitioned across a set of storage nodes within the cluster.

Google has a similar backend that they’ve designed with Bigtable. It’s not SQL based but is instead IO based. It’s designed to work with map/reduce so might be a bit of a impedance mismatch for MySQL guys.

The MySQL guys have MySQL cluster which really shows a lot of promise. With MySQL 5.1 you’ll be able to store your data on disk which should open this up for larger installs. The memory option is still really nice because you could in theory have a few high performance (but small in size) tables which don’t ever have to wait for disk.

The biggest burden I think is the requirement for gigabit ethernet. Their design uses a central SQL node for communicating with the storage nodes which are on other machines. If you have to do a full table scan the SQL node will have to read all the data locally.

If you’re on a full-gigabit ethernet switch which has a backplane with enough bandwidth to push the data down to the local SQL node you could (in theory) sustain 125M per second but more likely the other nodes will be backed up performing disk seeks.

Due to the complexity of configuring NDB I’m wondering how long it will be until someone sells NDB clusters the way Amazon is selling S3.

Basically you could buy an NDB cluster with a set number of data nodes and SQL nodes and pay a fixed amount per month.

I wonder if MySQL should partner with a few hosting products to deliver this solution.

Update: I accidentally posted this before I finished editing this story. Hopefully MySQL planet hasn’t picked it up yet.


  1. No, because 94% of the reason I need a high query rate, is because I would need 100 other machines on the same LAN, to hit the SQL Node frontends…

  2. To set-up a basic NDB configuration should be doable for most people but if you want it involved in serious action, with really good performance and good response time it’s a good idea to know exactly the HW you’re running on, know exactly which tweaks to do in the configuration of NDB.

    So there is definitely an opening for business ideas for preconfigured MySQL Cluster solutions. I’m currently in the process of adding some more tweaks that one can do to tweak the scheduling algorithms in the NDB kernel, allow real-time operations and locking to CPU’s which makes this even more interesting.

    In such a scenario I’ve got good hope that one can have failover times in the event of a node crash downto 10 milliseconds and reponse times downto below 50 microseconds.

    Rgrds Mikael Ronstrom

  3. Paul.

    I really don’t understand your comment. Could you clarify please?

    Mikael

    Tweaking is important of course. Though I think that just like MySQL you can get 80% of the bang by using the right amount of memory and deploying on gigabit ethernet.

    Kevin

  4. Kevin,
    Agree on that, and buying a cluster interconnect can many times be another simple booster.

    However there is always users that want to tweak even more out of their solutions and since this is not a light task it is usually only done by large enterprises but companies that do it for you as you mentioned is another way of achieving this.

    Rgrds Mikael

  5. “Their design uses a central SQL node for communicating with the storage nodes which are on other machines. If you have to do a full table scan the SQL node will have to read all the data locally.”

    This is not entirely true. Nothing is really “central” or “governing” in MySQL Cluster. It is true that when the cluster is in a particular state, a single process might command other units (f.e. Transaction manager, Arbitrator) but such a process can run on many locations and is not tied to a single machine.

    Also, re. the table scan: If you are running cluster, you want to have all the MySQLD nodes run with the

    engine_condition_pushdown=1

    option.
    When a full table scan needs to be done in order to match a WHERE clause that cannot use an index, this option will ensure that the WHERE clause will be executed as much as possible on the data nodes. This will result in only the matching rows being sent back from the data nodes to the SQL nodes.
    Without engine_condition_pushdown enabled, the data is first copied to the SQL node and the WHERE matching is then done in the SQL node. This is of course very costly.

    Of course, a SELECT * without a WHERE clause can never benefit from enginge_condition_pushdown, because by definition, this query requires all of the data to be sent back.

  6. we definitely use bigtable at google for many of our end-user webapps. however, we also use mysql in some places. we scale it horizontally, ie partition it, instead of replicating. we replicate for data warehousing, backup, etc, but not as much for scaling data volume.

    more:

    http://snarfed.org/space/scaling_data_on_the_cheap.html

  7. I seem to remember from the MySQL site that gigabit ethernet is not a requirement. I just thing that all machines must either be gigabit or megabit. Obviously, gigabit is preferable.

    My question is: Are my tables and sql going to be different? In a shared-nothing environment it’s impossible to do referential integrity, so no foreign keys. Is that correct? If I’m designing a schema with future consideration for horizontal replication, what should I do and not do?

  8. Jake

    I have a question. The documentation says:

    “Horizontal Data Partitioning
    Data within NDB tables is automatically partitioned across all of the data nodes in the system. This is done based on a hashing algorithm based on the PRIMARY KEY on the table, and is transparent to the end application.”

    Does this mean that the data on the database nodes are not exact copies of each other? If they are not exact copy of each other, then suppose something goes wrong, this means that I won’t be able to simply go to one of the nodes, and back it up; in other words the only way to backup the data is to back up the cluster, and not any single data node.






%d bloggers like this: