HOWTO Configure RAID Strides in XFS

There doesn’t appear to be any documentation on how to figure XFS to stride across a RAID array.

After about two hours of google searches I finally figured it out.

There are two variables – sunit and swidth:

sunit=value

This is used to specify the stripe unit for a RAID device or a logical volume. The value has to be specified in 512-byte block units. Use the su suboption to specify the stripe unit size in bytes. This suboption ensures that data allocations will be stripe unit aligned when the current end of file is being extended and the file size is larger than 512KiB. Also inode allocations and the internal log will be stripe unit aligned.

swidth=value

This is used to specify the stripe width for a RAID device or a striped logical volume. The value has to be specified in 512-byte block units. Use the sw suboption to specify the stripe width size in bytes. This suboption is required if -d sunit has been specified and it has to be a multiple of the -d sunit suboption.

So based on this here’s how to configure both values:

sunit = strip_size / 512
swidth = sunit * nr_raid_disks;

In my situation I’m using RAID 0 across 5 disks so my sunit is 128 and my swidth is 640.

In hindsight I think the su and sw options (expressed in bytes) would have been easier to compute.

su would just be 65536 and sw would just be 5 * su.

In our tests this yielded a 30-40% performance boost so certainly worth the trouble.

XFS is supposed to do this internally by calling an ioctl on the device to obtain the stripe settings. It doesn’t appear that the megaraid driver that we’re using supports this and it came back as zero/zero for both sunit and swidth. I imagine if you’re using Linux software RAID that it will work just fine.


  1. “In our tests this yielded a 30-40% performance boost”

    What were the tests?

    It seems like most of our servers have sunit and swidth set to 0. This is across a variety of RAID controllers — HP, Adaptec, 3ware, MegaRAID.

  2. The tests weren’t amazingly scientific but we were running a test bulk inserting 1M records or so in the 20-20k per record range.

    We’re doing a 4 node parallel insert performance test on our distributed DB … if I get better stats I’ll publish them here too.

    Kevin






%d bloggers like this: