Hard disk linear read speed measurements (or, logic? where?)
Skip over the boring bits and just show me the pretty pictures
Ever wondered what is the read speed of your hard disk? Did you buy your
hard disk based on graphs in toms hardware or your local computer magazine?
If so, I'm sorry for you. The following graphs were generated by doing
AIO read operations (1 MiB per read) bypassing the kernel block cache (O_DIRECT) over
the whole disk area. Since IDE chips don't have their own buffer memory, the only
buffering effects that are visible (with the newer/larger disks) are from the
buffer residing on the disk controller ("in the disk"), also known as "cache".
Note that for some graphs the graphing software (rlplot) decides to use
scientific notation for the x-axis labels. This is a good example why I write
my own software so often.
How to read the graphs
The analysis program will send off an 1MiB AIO read operation and wait for its
completion. Once kernel has completed the operation, elapsed time (usec) is recorded
into the statistics array. This is repeated for each successive 1MiB block in the block
device (the software doesn't care what kind of device it is reading).
When all blocks have been processed, the software will output the list of statistics.
This list will then be loaded into rlplot and it will "connect the
dots" with one thick line. Because of this, the black "areas" should be interpreted as
maximum/minimum speeds. Using dots for each data point would have given a truer
picture, but rlplot doesn't really want me to do that, so the thick connecting line is used.
In real life you should probably concentrate on the shape of the upper boundaries and
just ignore most of the spikes which seem irregular. This is not a problem in the
analysis software, nor in the plotting package, but for larger disks the disk internal
cache will sometimes behave quite unpredictibly. On the other hand, the 4111 MiB disk
clearly has an interesting feature which causes it to slow down when zones are crossed (the spikes
are not irregular at all).
What does this all mean?
The "20 Gigabyte" Seagate clearly shows that if a benchmarking program only
reads the start of the disk, it will present a wrong picture about overall
linear performance. Granted, the said disk is an anomaly amongst other disks
graphed on this page, but there are weird physical sector mappings around, this isn't
the only one.
Staying with the "20G" disk, you can see that it is wrong to assume that
logical sector 0 is always the fastest, or located closest to the center of the
disk, or even closest to the edge of the disk. Mapping from logical to physical
sectors has been hidden from us on purpose since the jolly good old MFM-days.
Another interesting lesson is that even when reading relatively large blocks
of consecutive data (1 MiB is large compared to 512 byte sector size), the disk
will have to spin one (or more) revolutions before it can sync with the desired
sector. This will not happen very often (less than half of the time), but it does
happen more often than one might imagine. The effect is more profound when
the overall system speed is slower than the relative disk speed (whatever that
really means :-). It remains to be seen whether this extra spinning can be
avoided by overlapping multiple consecutive AIO-requests, or will this just confuse
the elevator algorithm in the kernel. For now, the analysis software will keep
it simple.
Measuring linear read speed means nothing
For desktop and server use the raw linear speed is just the theoretical maximum.
It is impossible to achieve it if all your I/O operations will go through a filesystem.
Couple this with the fact that normal systems actually have some real applications
running in parallel (or in sequence as it normally is unfortunately), the theoretical
maximum limit will stay just that. Once you add Apache and PHP or Java environments
to this, you can pretty safely kiss goodbye the theoretical limits anyway.
If you're looking to measure performance at
application level, you should probably do that instead of measuring raw speed.
There are various tools to aid you in this (bonnie, IOMeter and others). If you're
interested in measuring how well your application will behave, build measurement
and statistics aids into your software. There is no other way.
However,
if you're developing your own data storage system in which you will bypass
the kernel caching systems and don't really want to use existing file systems,
the theoretical limit becomes now sometimes achievable. This is the reason this data
interests me personally.
You should be using SATA. It is much faster
Once affordable disks that can sustain over 133 MiB/s appear on the market,
this might actually be true. However, I do have a question. What will happen
to signal quality when the voltage levels are dropped to less than 1V and data
signalling rate is increased past 1 Gbps? Now, what will happen when you run
such data through cables that have no shielding? This paragraph is highly
opinionated and based only on CRC-error counts from SATA-devices.
Flash devices
So far, it seems that each device is different. One thing that is notable is
that when you buy a "brand" usb flash device, the (high) odds are that the
device has been manufactured by someone else. Also, even usb sticks that
are sold under the same "model", are infact, different devices (OEM is different
as well as the speed).
Optical media
In the age of CAV, the start of the disk is the slowest area (with the exception
of dual layer DVDs where switching layers will cause the pattern to repeat).
This means that if you want to manufacture a CD-ROM from which large files
can be read quickly, you have to make an image which will cover the full
CD-ROM and place the files at the end of that image. Burning these images
will take more time, but such is the price of reverse logic.
Linux software raid
When using RAID1 (mirror), it would be beneficial to utilize both (all) disks
in the set for reading at the same time (interleaving reads across disks). This
is not done in Linux. Instead the driver will read only from the first disk.
If another process/thing comes along which also wants to read the swraid device,
the driver is intelligent enough to read from other disks as well (at the same time).
In practice the latter scenario is more useful but some people expect RAID1
to be twice as fast for large reads as single disk speed. I know I did.
Measurement noise
The only reliable way to minimize jitter in the measurement results is
running the system under measurement in single user mode. Seems that even
on "server" setups there are things happening which cause context switches
and will delay the completed AIO operation enough to affect measured "speed".
Utilizing interleaved AIO operations might be the solution to this problem
(since it still sometimes happens in single user mode), but this has not been done
in the current measurement program.
If a system is not in single user mode, then additional noise introduced
by running ssh or X desktop doesn't seem to introduce significant additional
jitter (so it seems so far at least).
But what about a deeper meaning?
It was suggested that the pictures are similar to Rorschach charts, and hence it
might be postulated that the black and white graphs might represent the inner
soul of each hard disk. If this is true, there must be some very interesting
data stored on the "20G" Seagate.
For pictures whose name starts with 'note', there is a note relating to
that picture (or pictures) at the bottom of this page. Do check it out for
(possible) explanation or additional information.

Click for full size graph (cdrom-itchy-HLDTST-DVDRAM-GSA-4163B-HPScan3.png) Graph added 2006-09-07

Click for full size graph (dvd-itchy-HLDTST-DVDRAM-GSA-4163B-promocrap_DVD-DL.png) Graph added 2006-09-07

Click for full size graph (dvd-itchy-HLDTST-DVDRAM-GSA-4163B-SS2005DE_DVD.png) Graph added 2006-09-07

Click for full size graph (dvd-itchy-HLDTST-DVDRAM-GSA-4163B-VS2005Beta2_DVD.png) Graph added 2006-09-07

Click for full size graph (flash-chehov-ehci-KingstonDataTraveler2-aka-MSystemsFlashDiskPioneers.png) Graph added 2006-12-02

Click for full size graph (flash-chehov-ehci-KingstonDataTraveler2-aka-ToshibaCorp.png) Graph added 2006-12-02

Click for full size graph (flash-itchy-ehci-hub-fujitech-pretecRuggedCF.png) Graph added 2006-12-02

Click for full size graph (flash-itchy-ehci-samsung-YPU2.png) Graph added 2006-12-02

Click for full size graph (note01-itc.png) Graph added 2006-12-01

Click for full size graph (note02-a-chehov-sda-simul.png) Graph added 2006-12-02

Click for full size graph (note02-b-chehov-sdb-simul.png) Graph added 2006-12-02

Click for full size graph (note03-a-chehov-md1-single.png) Graph added 2006-12-02

Click for full size graph (note03-b-chehov-md1-simulA.png) Graph added 2006-12-02

Click for full size graph (note03-c-chehov-md1-simulB.png) Graph added 2006-12-02

Click for full size graph (note04-igor-HTS721080G9SA00-aes256.png) Graph added 2006-12-02

Click for full size graph (note04-igor-HTS721080G9SA00-raw.png) Graph added 2006-12-02

Click for full size graph (note05-a-paris-sda-nostderr.png) Graph added 2006-12-04

Click for full size graph (note05-b-paris-sdb-nostderr.png) Graph added 2006-12-04

Click for full size graph (note05-c-paris-sda-simul-nostderr.png) Graph added 2006-12-04

Click for full size graph (note05-d-paris-sdb-simul-nostderr.png) Graph added 2006-12-04

Click for full size graph (note06-a-nicole-md0-nostderr.png) Graph added 2006-12-04

Click for full size graph (note06-b-nicole-md0-stderrssh.png) Graph added 2006-12-04

Click for full size graph (note07-a-igor-mmc-viking256sd-init1-nostderr.png) Graph added 2006-12-04

Click for full size graph (note07-b-igor-mmc-viking256sd-init1-stderr.png) Graph added 2006-12-04

Click for full size graph (note07-c-igor-mmc-viking256sd-nostderr.png) Graph added 2006-12-04

Click for full size graph (note07-d-igor-mmc-viking256sd-stderrssh.png) Graph added 2006-12-04

Click for full size graph (note07-e-itchy-ehci-hub-fujitech-viking256sd.png) Graph added 2006-12-04

Click for full size graph (note08-a-igor-mmc-sandisk128sd-init1-nostderr.png) Graph added 2006-12-04

Click for full size graph (note08-b-itchy-ehci-hub-fujitech-sandisk128sd-stderr-desktop.png) Graph added 2006-12-02

Click for full size graph (note09-a-igor-uhci-n770-rsmmc64-init1-nostderr.png) Graph added 2006-12-04

Click for full size graph (note09-b-itchy-ehci-hub-n770-rsmmc64-nostderr-desktop.png) Graph added 2006-12-04

Click for full size graph (note09-c-igor-ehci-cardreader-rsmmc64-init1-nostderr.png) Graph added 2006-12-04

Click for full size graph (note10-a-itchy-usb2-iriverIFP799-stderr-desktop.png) Graph added 2006-12-04

Click for full size graph (note10-b-itchy-usb2-iriverIFP799-nostderr-desktop.png) Graph added 2006-12-04

Click for full size graph (note11-lktstogw-sda-nostderr-ssh.png) Graph added 2006-12-04

Click for full size graph (note11-lktstogw-sdb-nostderr-ssh.png) Graph added 2006-12-04

Click for full size graph (note12-lasennus-megaraid-raid1.png) Graph added 2006-12-02

Click for full size graph (pata-blart-Maxtor-5A250J0.png) Graph added 2006-09-06

Click for full size graph (pata-blart-ST36421A.png) Graph added 2006-09-06

Click for full size graph (pata-disks-FUJITSU-M1624TAU-1997-10.png) Graph added 2006-09-06

Click for full size graph (pata-disks-IBM-DJAA-31270-1996-01.png) Graph added 2006-09-06

Click for full size graph (pata-disks-IBM-DJNA-351520-1999-08.png) Graph added 2006-09-06

Click for full size graph (pata-disks-IBM-DTLA-307015-2000-10.png) Graph added 2006-09-06

Click for full size graph (pata-disks-IBM-DTLA-307045-2000-06.png) Graph added 2006-09-06

Click for full size graph (pata-disks-Maxtor-6E040L0-2004-09.png) Graph added 2006-09-06

Click for full size graph (pata-disks-Maxtor-92041U4-2000-04.png) Graph added 2006-09-06

Click for full size graph (pata-disks-Maxtor-98196H8-2000-09.png) Graph added 2006-09-06

Click for full size graph (pata-disks-QUANTUM-LPS420A-1994-11.png) Graph added 2006-09-06

Click for full size graph (pata-disks-SAMSUNG-SV0432D.png) Graph added 2006-09-06

Click for full size graph (pata-disks-ST31276A-1997-21.png) Graph added 2006-09-06

Click for full size graph (pata-disks-ST320414A.png) Graph added 2006-09-06

Click for full size graph (pata-disks-ST32122A-1998-28.png) Graph added 2006-09-06

Click for full size graph (pata-files-IC35L120AVVA07-0.png) Graph added 2006-09-06

Click for full size graph (pata-files-MAXTOR-4W100H6.png) Graph added 2006-09-06

Click for full size graph (pata-files-ST3200826A.png) Graph added 2006-09-06

Click for full size graph (pata-kismet-IBM-DJSA-232.png) Graph added 2006-09-06

Click for full size graph (sata-radisson-sda-FUJITSU_MHV2060A_60GB.png) Graph added 2006-09-07
Notes:
- Pentium D 3.4GHz, E7230, 3ware 9550SX (BBU), RAID10 (64k stride) over 4 * 36G Raptors (WD360ADFD).
Data gathered on a production system (not single-user).
- Reading two SAMSUNG SP2004C (SATA) drives simultanously. No slowdown.
Large amount of measurement jitter possibly caused by running VMware server
in which one OpenSUSE was idling.
- Comparison between running one vs two readers against a swraid RAID1 pair.
Interestingly Linux swraid (in 2.6 at least) does not interleave reads over two disks,
but when there are more than one reader, it will utilize the extra capacity.
- Comparison between reading a laptop SATA-disk directly and via aes256 dmcrypt.
- Itanium with two scsi disks (10krpm) (tech). First
separate reads and then simultanous reads (the second reader was started a bit later, which can be seen).
- Itanium with two scsi disks (15krpm) with swraid1 (tech). In first graph stderr has
been redirected to /dev/null (since fprintf to it will flush), in second
benchmark reader periodic statistics (stderr) is undirected and the connection
is over ssh with the link saturated (which oddly doesn't seem to skew results).
- stderr redirection has no significant effect here, but running tests in single user mode
has. The "disk" in question is a 256 MiB SD card which is read through
an integrated MMC/SD/xD reader using Linux MMC/SD-driver (tech).
For comparison, the same card is read using a USB2 card reader on another computer.
The only sane explanation for the speed difference is that since the SD Association
keeps the full SD specification closed to open source development, the mmc driver
uses the SPI-mode, which is significantly slower (for more details, please read the "Secure Digital" entry in wikipedia).
What makes this card special (or maybe not, I've only got two cards to test), is that
the end area of the card is faster. This is seen in the mmc-driver pictures, but the
effect is much more profound when using an external card reader.
If the root cause for the slowness is lack of usable specs, then I wish to use this
small space to personally Thank You SDA. Yes, I'm being ironic.
- To verify that using SPI mode with SD might be the problem, another card was
used with mmc as well. The card has also peculiar characteristics but still the
speed is restricted compared to using an external card reader.
- Staying with flash-cards (seems that every card has its own personality).
First the card is read through Nokia 770 (using N770 as USB target) (tech).
The 770 claims USB2 compliance but still only supports 12 Mbps wire speed (which is allowed, but
makes you wonder why bother with USB2). On igor, Linux decides to put the 770 behind
UHCI driver which can be seen from the tech report. Reading speed is pretty abysmal,
and the theory is that the 770 is using SPI mode with the card. The test is
then repeated on itchy (tech) and other than the very
much spiky graph and that now the 770 is behind EHCI, no difference is noteable. The spikes are due to itchy running
full graphical desktop and multimedia services at the same time. The test on igor
was done in single user mode.
Once the RS-MMC card is moved over to the external card reader, yet another
suprise jumps at the unsuspecting tester. The card reader starts much faster, not using SPI
mode, but after some joy and fun, decides to fall to SPI mode? Suprising at least.
The card in question came with Nokia 770.
Unfortunately the integrated card reader on igor doesn't handle RS-MMC so the
card cannot be tested there (in order to get the SD/MMC manufacturer information,
which I do not know how to decode anyway). Since external USB card readers will conveniently hide
all manufacturer detail, they cannot be used to get the information.
- iRiver FPP-799 (1 "GiB" model) (tech) is an mp3-player that also decodes (ogg) vorbis, which
was the reason why I bought it originally (and no, it's not a good vorbis player, but neither
is Samsung YP-U2). The original intent was to test whether there was a difference
between letting stderr come to screen (through konsole/X) or not. However, something
completely different happened. At the end of the readable range the device
suddenly decides that it's going too fast and drops speed. The change is not
attributable to background effects (such as scheduling and others) since those
come out as "additional noise" to the graphs. I have no explanation.
The device originally didn't support USB-bulk storage profile, but this was
added via a firmware update. It could be that the device emulates the profile
and does so badly. It's quite slow as well (writing speed is even slower).
- A system with one SATA disk and yet another Kingston Datatraveller (tech).
This is the first usb flash stick that I've seen in which Kingston actually
bothered to change the usb device vendor/product Ids to hide the OEM. There
is nothing special in these two pictures (for a change). Tests were not
done in single user mode, but the system is not running X and was pretty
"quiet".
- Hardware RAID1 using two SCSI disks (rotational speeds unknown) through
LSI/whatnot MegaRAID (tech). Not in single user
mode but system quiet and without X. For this system, the data travels
through multiple PCI chips/devices before reaching destination.
Copyright 2006 Aleksandr Koltsoff (http://koltsoff.com/pub/blockspeed)