24 Hours with an SSD and MySQL

I’ve now had about 24 hours to play with the Mtron SSDs and had some time to benchmark them.

The good news is that the benchmarks look really solid. The drive is very competitive in terms of performance. I’m seeing about 100MB/s sequential read throughput and 80MB/s sequential write throughput.

I’ve had some time to benchmark them and they’re really holding up.

The bad news is that they can only do about 180 random writes per second. Here’s are the raw performance numbers from Mtron’s data sheet:

200801301443

I spent a lot of time reviewing this drive and didn’t notice this before.

The Battleship Mtron review went over this as well but didn’t spend much time on it:

Although they do perform astounding in random read operation, random write is still very sub-par on flash technology. Even though we are not benchmarking random write IOP’s I will give you some quick insight. Write performance is not yet a perfect and refined process using NAND flash and you will not have a drive that is going to write file operations as well as a high end U320 15K SCSI or SATA 10K setup. There is a company that I have been talking with directly about this NAND flash write issue called EasyCo in PA, USA. They are working on a process called MFT technology and they offer a simple MFT driver that is claiming to increase random write IOP’s on a single drive up to 15,000 IOP’s. Doug Dumitru had explained to me this technology will take your standard Mtron 16GB Professional drive and turn it into an enterprise behemoth.

I spent some time to see what EasyCo was up to and came across their Managed Flash Technology:

Managed Flash Technology (MFT) is a patent pending invention that accelerates the random write performance of both Flash Disks and Hard Disks by as much as a thousand fold.

It does this by converting random writes into chained linear writes. These writes are then done at the composite linear write speed of all the drives present in the file volume, subject only to the bandwidth limits of the disk control mechanism. In practice, even with as few as three drives present, this can result in the writing of as many as 75,000 4-kilobyte blocks a second.

As a result, MFT can dramatically improve the real-time performance of asymmetric storage devices such as Flash disks by making reads and writes symmetrical. Here, flash disk performance is typically improved 10 to 30 times, making some of these 60 times as fast as the fastest hard disk. Finally, it is possible to make clusters of as few as 20 flash drives run collectively as fast as RAM does but with a much larger storage space than RAM can practically have.

The question is what are they doing to get such substantial performance?

Here’s what I think is happening.

From what I’ve read they take a normal Mtron drive and install a new Linux kernel module which they use to interface with the drive. They then use a normal write ahead log and keep data in memory (probably something like a 500M buffer) and a binary tree of the block offsets. When the buffer fills they then take the data in memory, sort the results by offset, and apply the buffer to disk sequentially.

If the box crashes they have an on disk log that they apply. Probably when the drive is first mounted.

Basically a flash aware write ahead log.

Fortunately, InnoDB has a write ahead log internally so this should save us from needing to run a custom kernel module. Any database with a write ahead log should be more than competitive.

I wrote a simple benchmarking utility (see Figure 1 below) to simulate an InnoDB box performing thousands of random reads and one sequential write.

The benchmark consists of 3500 dd process running in the background reading from the SSD and writing to /dev/null. I then have one sequential write performing in the foreground writing out about 5G of data to the SSD.

The HDD holds up really well when compared to the SSD which should have an unfair advantage. So much so that I think the Linux scheduler is interfering with my benchmark. I think that’s happening is that the first few dd’s start reading in parallel and block the remaining process. This continues with 5-10 concurrent readers until the entire chain of 3500 completes.

I’m going to rewrite the benchmark to create one large 10G file and randomly read 10G from misc locations.

As you can see while SSD is very fast but it’s only about 2.5x faster than HDD. I’d expect it to be about 20-40x faster.

200801301441

Figure 1. Performance of SSD vs HDD (measured in seconds)


  1. mdcallag

    Do you have access to a system with ZFS? IO from ZFS should be much more sequential as it doesn’t update in place.

  2. I don’t have access to ZFS….. does ZFS use a write ahead log? I’m going to have to check…..

    Thanks for the suggestion.

    Kevin

  3. dougdumitru

    Mr. Burton,

    My sales associate has already followed up via email , but I wanted to mention here that your guess about how MFT works internally is not correct. Your idea is interesting, but MFT uses far less memory (less than 40 MB for a 32GB drive) and delivers random write performance that is within 10% of the drives available linear bandwidth. A sorting write cache would help the drive’s random write IOPS somewhat, but would leave a lot of outstanding unbuffered, out of order, data and also cannot even approach the random write performance that we achieve. What you are describing is actually closer to what Mtron does in it’s 16 MB cache on the drive in order to get to 130 write IOPS (the native chips are probably only good for 25 unbuffered). Suffice it to say that MFT works very very differently.

    In that the web tends to “never forget”, I would ask that you edit your post to strike your own guess, or at least mention that EasyCo says you guessed wrong. I don’t want to defend MFT as a “write cache” product because of some viral reference that leads back to here. It is not.

    In terms of guessing, your idea is pretty good, it just does not describe MFT at all.

    You can read more about MFT at http://managedflash.com

    Doug Dumitru
    CTO EasyCo LLC

  4. Doug… by posting here I think you already corrected me :)

    After posting this I realized that you were probably using a log fs and making everything sequential but I started down another 24 hours of hacking.

    Kevin

  5. Doug, when will all this be available?

  6. Matt,

    They have all of this available now…. you have to purchase the drives from them and you get a new proprietary kernel module that you have to install.

    Kevin

  7. tgabi

    It’s interesting to see how MFT technology deals with drive’s “wear leveling” firmware. I have some MemoRight drives (800 write IOPS) already in production but this would open new avenues.

  8. harrisonf

    It would be interesting to see a benchmark of PBXT with SSD. From what I know, it uses a log based table structure so the majority of writing is done sequentially.

    I have also seen some presentations about PBXT where it is explicitly mentioned that SSD will work well with it, but I haven’t see any benchmarks.

    http://www.primebase.com/xt/

  9. Harrisonf,

    GOOD suggestion. I forgot about PBXT and that it’s log based.

    I initially dismissed it because I thought the random seeks wouldn’t be right for us.

    However, on SSD this would be more than fine.

    Anyway, I’m going to benchmark it and I’ll have another blog post on it…

    Kevin

  10. jamonbowen

    I would be interested in seeing how the drive handles mixed workloads. If it can handle 180 random writes per second, over 10k random reads per second, and a high sequential write log in parallel, then the current database logging structure would be able to leverage the drive characteristics. However, if 90 random writes/s takes half of the ultimate performance of the drive and only ~5000 reads a second are possible the use cases for the drive drop off dramatically (note that 90/5090 is only a 1.7% write workload).

    You may want to look into xdd for a more powerful IO load generator.

    Jamon

  11. Good stuff, again. I’m surprised more aren’t writing about this. I have this theory that the new Intel SSD drives have something like MFT and an IOP chip built in which could explain the power hungry-ness and performance claims.

    If Fusion-IO doesn’t hurry it’s possible that they’ll be outclassed by a good Areca card and a few Intel drives. My barely educated guess is that a Battleship MTron style array with an Areca 1680 would max out at 1.2GBytes/second which ain’t too shabby.

  12. KirkH,

    Is there any documentation on the Intel SSD specs?

    If they do MFT that would be SWEET.

    Kevin

  13. driver

    the effect of MFT specially by running database like MySQL is huge. On one hand it allowed you to use low cost flash disc with better performance then a standard hard drive and at the other the high quality SSD products get with MFT much higher write performance…

    the detail information are now at http://www.easyco.com
    and there will be further improvements to optimize this product…






%d bloggers like this: