Hard disk linear read speed measurements (or, logic? where?)

Skip over the boring bits and just show me the pretty pictures

Ever wondered what is the read speed of your hard disk? Did you buy your hard disk based on graphs in toms hardware or your local computer magazine? If so, I'm sorry for you. The following graphs were generated by doing AIO read operations (1 MiB per read) bypassing the kernel block cache (O_DIRECT) over the whole disk area. Since IDE chips don't have their own buffer memory, the only buffering effects that are visible (with the newer/larger disks) are from the buffer residing on the disk controller ("in the disk"), also known as "cache".

Note that for some graphs the graphing software (rlplot) decides to use scientific notation for the x-axis labels. This is a good example why I write my own software so often.

How to read the graphs

The analysis program will send off an 1MiB AIO read operation and wait for its completion. Once kernel has completed the operation, elapsed time (usec) is recorded into the statistics array. This is repeated for each successive 1MiB block in the block device (the software doesn't care what kind of device it is reading).

When all blocks have been processed, the software will output the list of statistics. This list will then be loaded into rlplot and it will "connect the dots" with one thick line. Because of this, the black "areas" should be interpreted as maximum/minimum speeds. Using dots for each data point would have given a truer picture, but rlplot doesn't really want me to do that, so the thick connecting line is used.

In real life you should probably concentrate on the shape of the upper boundaries and just ignore most of the spikes which seem irregular. This is not a problem in the analysis software, nor in the plotting package, but for larger disks the disk internal cache will sometimes behave quite unpredictibly. On the other hand, the 4111 MiB disk clearly has an interesting feature which causes it to slow down when zones are crossed (the spikes are not irregular at all).

What does this all mean?

The "20 Gigabyte" Seagate clearly shows that if a benchmarking program only reads the start of the disk, it will present a wrong picture about overall linear performance. Granted, the said disk is an anomaly amongst other disks graphed on this page, but there are weird physical sector mappings around, this isn't the only one.

Staying with the "20G" disk, you can see that it is wrong to assume that logical sector 0 is always the fastest, or located closest to the center of the disk, or even closest to the edge of the disk. Mapping from logical to physical sectors has been hidden from us on purpose since the jolly good old MFM-days.

Another interesting lesson is that even when reading relatively large blocks of consecutive data (1 MiB is large compared to 512 byte sector size), the disk will have to spin one (or more) revolutions before it can sync with the desired sector. This will not happen very often (less than half of the time), but it does happen more often than one might imagine. The effect is more profound when the overall system speed is slower than the relative disk speed (whatever that really means :-). It remains to be seen whether this extra spinning can be avoided by overlapping multiple consecutive AIO-requests, or will this just confuse the elevator algorithm in the kernel. For now, the analysis software will keep it simple.

Measuring linear read speed means nothing

For desktop and server use the raw linear speed is just the theoretical maximum. It is impossible to achieve it if all your I/O operations will go through a filesystem. Couple this with the fact that normal systems actually have some real applications running in parallel (or in sequence as it normally is unfortunately), the theoretical maximum limit will stay just that. Once you add Apache and PHP or Java environments to this, you can pretty safely kiss goodbye the theoretical limits anyway. If you're looking to measure performance at application level, you should probably do that instead of measuring raw speed. There are various tools to aid you in this (bonnie, IOMeter and others). If you're interested in measuring how well your application will behave, build measurement and statistics aids into your software. There is no other way.

However, if you're developing your own data storage system in which you will bypass the kernel caching systems and don't really want to use existing file systems, the theoretical limit becomes now sometimes achievable. This is the reason this data interests me personally.

You should be using SATA. It is much faster

Once affordable disks that can sustain over 133 MiB/s appear on the market, this might actually be true. However, I do have a question. What will happen to signal quality when the voltage levels are dropped to less than 1V and data signalling rate is increased past 1 Gbps? Now, what will happen when you run such data through cables that have no shielding? This paragraph is highly opinionated and based only on CRC-error counts from SATA-devices.

Flash devices

So far, it seems that each device is different. One thing that is notable is that when you buy a "brand" usb flash device, the (high) odds are that the device has been manufactured by someone else. Also, even usb sticks that are sold under the same "model", are infact, different devices (OEM is different as well as the speed).

Optical media

In the age of CAV, the start of the disk is the slowest area (with the exception of dual layer DVDs where switching layers will cause the pattern to repeat). This means that if you want to manufacture a CD-ROM from which large files can be read quickly, you have to make an image which will cover the full CD-ROM and place the files at the end of that image. Burning these images will take more time, but such is the price of reverse logic.

Linux software raid

When using RAID1 (mirror), it would be beneficial to utilize both (all) disks in the set for reading at the same time (interleaving reads across disks). This is not done in Linux. Instead the driver will read only from the first disk. If another process/thing comes along which also wants to read the swraid device, the driver is intelligent enough to read from other disks as well (at the same time). In practice the latter scenario is more useful but some people expect RAID1 to be twice as fast for large reads as single disk speed. I know I did.

Measurement noise

The only reliable way to minimize jitter in the measurement results is running the system under measurement in single user mode. Seems that even on "server" setups there are things happening which cause context switches and will delay the completed AIO operation enough to affect measured "speed". Utilizing interleaved AIO operations might be the solution to this problem (since it still sometimes happens in single user mode), but this has not been done in the current measurement program. If a system is not in single user mode, then additional noise introduced by running ssh or X desktop doesn't seem to introduce significant additional jitter (so it seems so far at least).

But what about a deeper meaning?

It was suggested that the pictures are similar to Rorschach charts, and hence it might be postulated that the black and white graphs might represent the inner soul of each hard disk. If this is true, there must be some very interesting data stored on the "20G" Seagate.


For pictures whose name starts with 'note', there is a note relating to that picture (or pictures) at the bottom of this page. Do check it out for (possible) explanation or additional information.


Click for full size graph (cdrom-itchy-HLDTST-DVDRAM-GSA-4163B-HPScan3.png)
Graph added 2006-09-07


Click for full size graph (dvd-itchy-HLDTST-DVDRAM-GSA-4163B-promocrap_DVD-DL.png)
Graph added 2006-09-07


Click for full size graph (dvd-itchy-HLDTST-DVDRAM-GSA-4163B-SS2005DE_DVD.png)
Graph added 2006-09-07


Click for full size graph (dvd-itchy-HLDTST-DVDRAM-GSA-4163B-VS2005Beta2_DVD.png)
Graph added 2006-09-07


Click for full size graph (flash-chehov-ehci-KingstonDataTraveler2-aka-MSystemsFlashDiskPioneers.png)
Graph added 2006-12-02


Click for full size graph (flash-chehov-ehci-KingstonDataTraveler2-aka-ToshibaCorp.png)
Graph added 2006-12-02


Click for full size graph (flash-itchy-ehci-hub-fujitech-pretecRuggedCF.png)
Graph added 2006-12-02


Click for full size graph (flash-itchy-ehci-samsung-YPU2.png)
Graph added 2006-12-02


Click for full size graph (note01-itc.png)
Graph added 2006-12-01


Click for full size graph (note02-a-chehov-sda-simul.png)
Graph added 2006-12-02


Click for full size graph (note02-b-chehov-sdb-simul.png)
Graph added 2006-12-02


Click for full size graph (note03-a-chehov-md1-single.png)
Graph added 2006-12-02


Click for full size graph (note03-b-chehov-md1-simulA.png)
Graph added 2006-12-02


Click for full size graph (note03-c-chehov-md1-simulB.png)
Graph added 2006-12-02


Click for full size graph (note04-igor-HTS721080G9SA00-aes256.png)
Graph added 2006-12-02


Click for full size graph (note04-igor-HTS721080G9SA00-raw.png)
Graph added 2006-12-02


Click for full size graph (note05-a-paris-sda-nostderr.png)
Graph added 2006-12-04


Click for full size graph (note05-b-paris-sdb-nostderr.png)
Graph added 2006-12-04


Click for full size graph (note05-c-paris-sda-simul-nostderr.png)
Graph added 2006-12-04


Click for full size graph (note05-d-paris-sdb-simul-nostderr.png)
Graph added 2006-12-04


Click for full size graph (note06-a-nicole-md0-nostderr.png)
Graph added 2006-12-04


Click for full size graph (note06-b-nicole-md0-stderrssh.png)
Graph added 2006-12-04


Click for full size graph (note07-a-igor-mmc-viking256sd-init1-nostderr.png)
Graph added 2006-12-04


Click for full size graph (note07-b-igor-mmc-viking256sd-init1-stderr.png)
Graph added 2006-12-04


Click for full size graph (note07-c-igor-mmc-viking256sd-nostderr.png)
Graph added 2006-12-04


Click for full size graph (note07-d-igor-mmc-viking256sd-stderrssh.png)
Graph added 2006-12-04


Click for full size graph (note07-e-itchy-ehci-hub-fujitech-viking256sd.png)
Graph added 2006-12-04


Click for full size graph (note08-a-igor-mmc-sandisk128sd-init1-nostderr.png)
Graph added 2006-12-04


Click for full size graph (note08-b-itchy-ehci-hub-fujitech-sandisk128sd-stderr-desktop.png)
Graph added 2006-12-02


Click for full size graph (note09-a-igor-uhci-n770-rsmmc64-init1-nostderr.png)
Graph added 2006-12-04


Click for full size graph (note09-b-itchy-ehci-hub-n770-rsmmc64-nostderr-desktop.png)
Graph added 2006-12-04


Click for full size graph (note09-c-igor-ehci-cardreader-rsmmc64-init1-nostderr.png)
Graph added 2006-12-04


Click for full size graph (note10-a-itchy-usb2-iriverIFP799-stderr-desktop.png)
Graph added 2006-12-04


Click for full size graph (note10-b-itchy-usb2-iriverIFP799-nostderr-desktop.png)
Graph added 2006-12-04


Click for full size graph (note11-lktstogw-sda-nostderr-ssh.png)
Graph added 2006-12-04


Click for full size graph (note11-lktstogw-sdb-nostderr-ssh.png)
Graph added 2006-12-04


Click for full size graph (note12-lasennus-megaraid-raid1.png)
Graph added 2006-12-02


Click for full size graph (pata-blart-Maxtor-5A250J0.png)
Graph added 2006-09-06


Click for full size graph (pata-blart-ST36421A.png)
Graph added 2006-09-06


Click for full size graph (pata-disks-FUJITSU-M1624TAU-1997-10.png)
Graph added 2006-09-06


Click for full size graph (pata-disks-IBM-DJAA-31270-1996-01.png)
Graph added 2006-09-06


Click for full size graph (pata-disks-IBM-DJNA-351520-1999-08.png)
Graph added 2006-09-06


Click for full size graph (pata-disks-IBM-DTLA-307015-2000-10.png)
Graph added 2006-09-06


Click for full size graph (pata-disks-IBM-DTLA-307045-2000-06.png)
Graph added 2006-09-06


Click for full size graph (pata-disks-Maxtor-6E040L0-2004-09.png)
Graph added 2006-09-06


Click for full size graph (pata-disks-Maxtor-92041U4-2000-04.png)
Graph added 2006-09-06


Click for full size graph (pata-disks-Maxtor-98196H8-2000-09.png)
Graph added 2006-09-06


Click for full size graph (pata-disks-QUANTUM-LPS420A-1994-11.png)
Graph added 2006-09-06


Click for full size graph (pata-disks-SAMSUNG-SV0432D.png)
Graph added 2006-09-06


Click for full size graph (pata-disks-ST31276A-1997-21.png)
Graph added 2006-09-06


Click for full size graph (pata-disks-ST320414A.png)
Graph added 2006-09-06


Click for full size graph (pata-disks-ST32122A-1998-28.png)
Graph added 2006-09-06


Click for full size graph (pata-files-IC35L120AVVA07-0.png)
Graph added 2006-09-06


Click for full size graph (pata-files-MAXTOR-4W100H6.png)
Graph added 2006-09-06


Click for full size graph (pata-files-ST3200826A.png)
Graph added 2006-09-06


Click for full size graph (pata-kismet-IBM-DJSA-232.png)
Graph added 2006-09-06


Click for full size graph (sata-radisson-sda-FUJITSU_MHV2060A_60GB.png)
Graph added 2006-09-07


Notes:

  1. Pentium D 3.4GHz, E7230, 3ware 9550SX (BBU), RAID10 (64k stride) over 4 * 36G Raptors (WD360ADFD).
    Data gathered on a production system (not single-user).
  2. Reading two SAMSUNG SP2004C (SATA) drives simultanously. No slowdown.
    Large amount of measurement jitter possibly caused by running VMware server in which one OpenSUSE was idling.
  3. Comparison between running one vs two readers against a swraid RAID1 pair.
    Interestingly Linux swraid (in 2.6 at least) does not interleave reads over two disks, but when there are more than one reader, it will utilize the extra capacity.
  4. Comparison between reading a laptop SATA-disk directly and via aes256 dmcrypt.
  5. Itanium with two scsi disks (10krpm) (tech). First separate reads and then simultanous reads (the second reader was started a bit later, which can be seen).
  6. Itanium with two scsi disks (15krpm) with swraid1 (tech). In first graph stderr has been redirected to /dev/null (since fprintf to it will flush), in second benchmark reader periodic statistics (stderr) is undirected and the connection is over ssh with the link saturated (which oddly doesn't seem to skew results).
  7. stderr redirection has no significant effect here, but running tests in single user mode has. The "disk" in question is a 256 MiB SD card which is read through an integrated MMC/SD/xD reader using Linux MMC/SD-driver (tech). For comparison, the same card is read using a USB2 card reader on another computer. The only sane explanation for the speed difference is that since the SD Association keeps the full SD specification closed to open source development, the mmc driver uses the SPI-mode, which is significantly slower (for more details, please read the "Secure Digital" entry in wikipedia). What makes this card special (or maybe not, I've only got two cards to test), is that the end area of the card is faster. This is seen in the mmc-driver pictures, but the effect is much more profound when using an external card reader.
    If the root cause for the slowness is lack of usable specs, then I wish to use this small space to personally Thank You SDA. Yes, I'm being ironic.
  8. To verify that using SPI mode with SD might be the problem, another card was used with mmc as well. The card has also peculiar characteristics but still the speed is restricted compared to using an external card reader.
  9. Staying with flash-cards (seems that every card has its own personality). First the card is read through Nokia 770 (using N770 as USB target) (tech). The 770 claims USB2 compliance but still only supports 12 Mbps wire speed (which is allowed, but makes you wonder why bother with USB2). On igor, Linux decides to put the 770 behind UHCI driver which can be seen from the tech report. Reading speed is pretty abysmal, and the theory is that the 770 is using SPI mode with the card. The test is then repeated on itchy (tech) and other than the very much spiky graph and that now the 770 is behind EHCI, no difference is noteable. The spikes are due to itchy running full graphical desktop and multimedia services at the same time. The test on igor was done in single user mode.
    Once the RS-MMC card is moved over to the external card reader, yet another suprise jumps at the unsuspecting tester. The card reader starts much faster, not using SPI mode, but after some joy and fun, decides to fall to SPI mode? Suprising at least. The card in question came with Nokia 770.
    Unfortunately the integrated card reader on igor doesn't handle RS-MMC so the card cannot be tested there (in order to get the SD/MMC manufacturer information, which I do not know how to decode anyway). Since external USB card readers will conveniently hide all manufacturer detail, they cannot be used to get the information.
  10. iRiver FPP-799 (1 "GiB" model) (tech) is an mp3-player that also decodes (ogg) vorbis, which was the reason why I bought it originally (and no, it's not a good vorbis player, but neither is Samsung YP-U2). The original intent was to test whether there was a difference between letting stderr come to screen (through konsole/X) or not. However, something completely different happened. At the end of the readable range the device suddenly decides that it's going too fast and drops speed. The change is not attributable to background effects (such as scheduling and others) since those come out as "additional noise" to the graphs. I have no explanation.
    The device originally didn't support USB-bulk storage profile, but this was added via a firmware update. It could be that the device emulates the profile and does so badly. It's quite slow as well (writing speed is even slower).
  11. A system with one SATA disk and yet another Kingston Datatraveller (tech). This is the first usb flash stick that I've seen in which Kingston actually bothered to change the usb device vendor/product Ids to hide the OEM. There is nothing special in these two pictures (for a change). Tests were not done in single user mode, but the system is not running X and was pretty "quiet".
  12. Hardware RAID1 using two SCSI disks (rotational speeds unknown) through LSI/whatnot MegaRAID (tech). Not in single user mode but system quiet and without X. For this system, the data travels through multiple PCI chips/devices before reaching destination.


Copyright 2006 Aleksandr Koltsoff (http://koltsoff.com/pub/blockspeed)