Why new hard disks might not be much fun for XP users
A rather surprising article hit the front page of the BBC on Tuesday: the next generation of hard disks could cause slowdowns for XP users. Not normally the kind of thing you'd expect to be placed so prominently, but the warning it gives is a worthy one, if timed a bit oddly. The world of hard disks is set to change, and the impact could be severe. In the remarkably conservative world of PC hardware, it's not often that a 30-year-old convention gets discarded. Even this change has been almost a decade in the making.
The problem is hard disk sectors. A sector is the smallest unit of a hard disk that software can read or write. Even though a file might only be a single byte long, the operating system has to read or write at least 512 bytes to read or write that file.
512-byte sectors have been the norm for decades. The 512-byte size was itself inherited from floppy disks, making it an even older historical artifact. The age of this standard means that it's baked in to a lot of important software: PC BIOSes, operating systems, and the boot loaders that hand control from the BIOS to the operating system. All of this makes migration to a new standard difficult.
Given such entrenchment, the obvious question is, why change? We all know that the PC world isn't keen on migrating away from long-lived, entrenched standardsâ€â€the continued use of IPv4 and the PC BIOS are two fine examples of 1970s and 1980s technology sticking around long past their prime, in spite of desirable replacements (IPv6 and EFI, respectively) being available. But every now and then, a change is forced on vendors in spite of their naturally conservative instincts.
Hard disks are unreliable
In this case, there are two reasons for the change. The first is that hard disks are not actually very reliable. We all like to think of hard disks as neatly storing the 1s and 0s that make up our data and then reading them back with perfect accuracy, but unfortunately the reality is nothing like as neat.
Instead of having a nice digital signal written in the magnetic surfaceâ€â€little groups of magnets pointing "all north" or "all south"â€â€what we have have is groups pointing "mostly south" or "mostly north." Converting this imprecise analog data back into the crisp digital ones and zeroes that represents our data requires the analog signal to be processed.
That processing isn't enough to reliably restore the data, though. Fundamentally, it produces only educated guesses; it's probably right, but could be wrong. To counter this, the hard disks store a substantial amount of error-checking data alongside each sector. This data is invisible to software, but is checked by the drive's firmware. This error-checking data gives the drive a substantial ability to reconstruct data that is missing or damaged using clever math, but this comes with considerable storage overhead. In a 2004-vintage disk, for every 512 bytes of data, typically 40 bytes of error checking data are also required, along with a further 40 bytes used to locate and indicate the start of the sector, and provide space between sectors. This means that 80 bytes are used for data integrity for every 512 bytes of user data, so about 13% of the theoretical capacity of a hard disk is gone automatically, just to account for the inevitable errors that come up when reading and interpreting the analog signal stored on the disk. With this 40-byte overhead, the drive can correct something like 50 consecutive unreadable bits. Longer codes could recover from longer errors, but the trade-off is that this eats into storage capacity.
Higher areal density is a blessing and a curse
This has been the status quo for many years. What's changing to make that a problem now? Throughout that period, areal densityâ€â€the amount of data stored in a given disk areaâ€â€has been on the rise. Current disks have an areal density typically around 400 Gbit/square inch; five years ago, the number would be closer to 100. The problem with packing all these bits into ever decreasing areas is that it's making the analog signal on the disk get increasingly worse. The signals are weaker, there's more interference from adjacent data, and the disk is more sensitive to minor fluctuations in voltages and other suboptimal conditions when writing.
This weaker analog signal in turn places greater demands on the error checking data. More errors are happening more of the time, with the result that those 40 bytes are not going to be enough for much longer. Typical consumer grade hard drives have a target of one unreadable bit for every 1014 read from disk (1014 bits is about 12 TB, so if you have six 2 TB disks in an array, that array probably has an error on it); enterprise drives and some consumer disks claim one in every 1015 bits, which is substantially better. The increased areal densities mean that the probability of 400 consecutive errors is increasing, which means that if they want to hit that one in 1014 target, they're going to need better error-checking. An 80-byte error checking block per sector would double the number of errors that can be corrected, up to 800 bits, but would also mean that about 19% of the disk's capacity was taken up by overheads, with only 81% available for user data.
In the past, enlarging the error correction data was viable; the increasing areal densities offered more space than the extra correction data used, for a net growth in available space. A decade ago, only 24 bytes were needed per sector, with 40 bytes necessary in 2004, and probably more in more recent disks. As long as the increase in areal density is greater than the increase in error correcting overhead (to accommodate signal loss from the increase in areal density), hard drives can continue to get larger. But hard drive manufacturers are now getting close to the point where each increase in areal density requires such a large increase in error correcting data that the areal density improvement gets canceled out anyway!
Making 4096 bytes the new standard
Instead of storing 512-byte sectors, hard disks will start using 4096-byte sectors. 4096 is a good size for this kind of thing. For one, it matches the standard size of allocation units in the NTFS filesystem, which nowadays is probably the most widely used filesystem on personal computers. Secondly, it matches the standard size of memory pages on x86 systems. Memory allocations on x86 systems are generally done in multiples of 4096 bytes, and correspondingly, many disk operations (such as reading to or from the pagefile, or reading in executable programs), which interact intimately with the memory system, are equally done in multiples of 4096 bytes.
4096 byte sectors don't solve the analog problemâ€â€signals are getting weaker, and noise is getting stronger, and only reduced densities or some breakthrough in recording technology are going to change thatâ€â€but it helps substantially with the error-correcting problem. Due to the way error correcting codes work, larger sectors require relatively less error correcting data to protect against the same size errors. A 4096 byte sector is equivalent to eight 512 byte sectors. With 40 bytes per sector for finding sector starts and 40 bytes for error correcting, protecting against 50 error bits, 4096 bytes requires (8 x 512 + 8 x 40 + 8 x 40) = 4736 bytes; 4096 of data, 640 of overhead. The total protection is against 400 error bits (50 bits per sector, eight sectors), though they have to be spread evenly among all the sectors.
With 4096 byte sectors, only one spacer start is needed, and to achieve a good level of protection, only 100 bytes of error checking data are required, for a total of (1 x 4096 + 1 x 40 + 1 x 100) = 4236 bytes; 4096 of data, 140 of overhead. 100 bytes per sector can correct up to 1000 consecutive error bits; for the forseeable future, this should be "good enough" to achieve the specified error rates. With an overhead of just 140 bytes per sector, about 96% of the disk's capacity to be used.
In one fell swoop, this change provides greater robustness against the problems caused by increasing areal density, and more efficient encoding of the data on disk. That's good news, except for that whole "legacy" thing. The 512 byte sector assumption is built in to a lot of software.
A 512-byte leaden albatross
As far back as 1998, IBM started indicating to the hard disk manufacturing community that sectors would have to be enlarged to allow for robust error correction. In 2000, IDEMA, the International Disk Drive Equipment and Materials Association, put together a task force to establish a large sector standard, the Long Data Block Committee. After initially considering, but ultimately rejecting, a 1024-byte interim format, in March 2006, they finalized their specification and committed to 4096 byte sectors. Phoenix produced preliminary BIOS support for the specification in 2005, and Microsoft, for its part, ensured that Windows Vista would support the new sector size. Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2 all support the new sector size. MacOS X supports it, and Linux kernels since September 2009 also support it.
The big obvious name missing from this list is Windows XP (and its server counterpart, Windows Server 2003). Windows XP (along with old Linux kernels) has, somewhere within its code, a fixed assumption of 512 byte sectors. Try to use it with hard disks with 4096 byte sectors and failure will ensue. Cognizant of this problem, the hard disk vendors responded with, well, a long period of inaction. Little was done to publicize the issue, no effort was made to force the issue by releasing large sector disks; the industry just sat on its hands doing nothing.
However, this situation clearly couldn't go on forever.
The other big roadblock: the 2TB partition limit
In addition to the areal density problem making errors more likely, a second issue has raised its head. The partition table, which the on-disk structure that describes the number and sizes of the partitions on a disk, can only really describe disks that are 2 TB or less in size. The partition table stores the size of a partition as a count of the number of sectors in that partition, and this count is a 32-bit number. That means it can go up to a little over 4 billion, but no more. Four billion 512-byte sectors is 2 TB. This poses a big problem for any company wanting to sell a disk that's bigger than 2 TB.
There are two solutions to this problem. Either change the partition table so it can store larger numbers of sectors, or change the size of a sector so that 4 billion sectors no longer limits the disk to 2 TB. For the first, there is indeed a new kind of partition format called the GUID Partition Table (GPT). GPT disks can have vast partitions, and would solve the problem neatly. Unfortunately, GPT disks can't generally be booted using the conventional PC BIOS. To enable full support for GPT, the BIOS has to be replaced with EFI, the firmware standard used for Itanium machines, Intel Macs, and a rare handful of other PCs. The BIOS is a slow, complex piece of code that has its origins in the 1980s, but it has the one advantage of being ubiquitous. EFI has been around a number of years, it can be used with modern operating systems, and it has no problem with disks larger than 2 TB, but industry adoption has been extremely slow.
This situation is slowly changing. A few motherboards are available with EFI instead of the BIOS, the aforementioned Intel Macs already ship with EFI, and Lenovo is beginning to offer EFI firmware for certain lines as part of its Enhanced Experience programme. Lenovo machines with Enhanced Experience include EFI instead of the legacy BIOS, and as a result boast substantially improved boot times. Nonetheless, EFI remains rare.
This has left disk vendors with little choice but to take the other approach: increase the size of sectors. Though clearly something they have wanted to do for many years, they now have little choice if they want to keep offering ever larger hard disks. EFI, with its support for GPT, won't be ubiquitous enough to be a viable option. 232, 4096-byte sectors would allow hard disks up to 16 TiB, providing hopefully enough headroom to allow EFI and GPT to become mainstream.
WD's Advanced Format
And so it was that last September (and it's this that makes it a little surprising that the BBC and other outlets are talking about the issue now, but it's one that certainly deserves the publicity), Western Digital announced its "Advanced Format" drives. Advanced Format drives use the 4096-byte sectors, 100-byte error codes, and a 40-byte gap as described above. However, to maintain compatibility with Windows XP, they pretend to use 512-byte sectors. As can be seen from the spec sheet (the drives with 64 MiB cache, model numbers ending in AARS or EARS) all use 4096 byte sectors internally) the sector counts even for the 2 TB drives are high; the 2 TB disk having just shy of 4 billion sectors.
This kind of deceit is a problem if software tries to write less than 4096 bytes at a time. To write 512 bytes out of 4096, the drive must read all 4096, update the 512 written bytes, and then write back all 4096 bytes (a process known as read-modify-write, RMW). That means more seeking and more disk activity, which is clearly going to perform worse than a 512 byte write on an old drive with true 512 byte sectors. But this isn't such a problem since, as already mentioned, most disk activity occurs in multiples of 4096 bytes anyway. When writing 4096 bytes, the RMW cycle isn't needed, as there's no need to read data if it's going to be overwritten anyway, so the performance impact is negligible.
The biggest problem is when the 4096 byte write straddles two sectors. When that happens, the situation is even worse as two RMW cycles are needed, one for each partially-written sector. However, as long as the partition starts on sector boundary, "almost all" subsequent writes willâ€â€due to the OS's widespread use of 4096 byte writesâ€â€line up properly, so they won't straddle multiple sectors and won't need read-modify-writes.
And as luck would have it, the most widely used operating system in the world will always create partitions that don't line up nicely. Single partition Windows XP systems will always make the first partition start on the 63rd 512 byte sector. If it was just one sector further on, then everything would line up nicely on these pseudo-512 byte sector drives. But as it is, Windows XP partitions on such a disk will have to suffer two RMW operations for almost every single write made to the disk. This is mitigated somewhat by many operations being multiples of 4096 bytes, so it's only at the start and end of each operation that the read-modify-write is needed, but nonetheless the overhead is substantial.
The other big problem is disk cloning software. Just as with Windows XP, many disk cloning tools will write out partitions so that they don't neatly line up with the 4096 byte sectors. These programs need to be updated so that the partitions they create will be properly aligned, and so that when migrating from a 512 byte to a 4096 byte disk, they slightly reposition the partitions to ensure proper alignment.
To that end, Western Digital has produced software to re-align partitions so that they all start on 4096 byte boundaries, thereby eliminating most of the RMW operations, except for the relatively infrequent smaller reads and writes. Split operations will still incur a sizeable penaltyâ€â€10% slower, with an extra 5 ms latencyâ€â€but shouldn't be so frequent as to cause a major problem. Any system using Windows XP or created with disk cloning/system imaging software will need to run this software to achieve satisfactory performance.
Other hard disk vendors are committed to introducing their own Advanced Format drives by 2011, so similar software solutions are likely to appear soon. Non-emulated drives, however, appear to be further off. Except for that legacy annoyance, Windows XP, the software is ready, and has been ready, at least on the Windows side, since 2006. Indications are, however, that the hard disk vendors will be reluctant to ship "native" 4096-byte sector support until Windows XP is much less significantâ€â€2012 at the earliest, but worst case as late as 2014 (though enterprise parts might arrive sooner than that).
So, anyone putting a new hard disk in an XP machine, beware. There's a very good chance it won't work as well as you might expect. Just another reason why running an OS from 2001 perhaps isn't such a great idea in 2010. These fundamental technologies might not change fast, but that doesn't mean they don't change at all.
Source :
Why new hard disks might not be much fun for XP users
Why new hard disks might not be much fun for XP users