Hi,
these days almost everyone is banging their heads over data loss due to hdd failure. fortunately there is a diagnostic technology called S.M.A.R.T which can help.
Read S.M.A.R.T data:
1) Various tools like Hdtune could be used to read smart data.
Make Sense of SMART:
Here some important attributes, we need to look at.
lastly, this is a excellent in-depth article on HDD failure:
Minimizing Hard Disk Drive Failure and Data Loss - Wikibooks, collection of open-content textbooks
Hopefully after reading this you would be more aware of reasons to behind hdd failure and be in state to handle to effectively handle data losses. Happy Computing!
PS: It would be great is people can add exact add Threshold values, so we may how what value is bad for which hdd and plan backups accordingly
these days almost everyone is banging their heads over data loss due to hdd failure. fortunately there is a diagnostic technology called S.M.A.R.T which can help.
Read S.M.A.R.T data:
1) Various tools like Hdtune could be used to read smart data.
Make Sense of SMART:
Here some important attributes, we need to look at.
wikipedia said:
- Read Error Rate: Indicates the rate of hardware read errors that occurred when reading data from a disk surface. The raw value has different structure for different vendors and is often not meaningful as a decimal number.
- Reallocated Sectors Count: Count of reallocated sectors. When the hard drive finds a read/write/verification error, it marks this sector as "reallocated" and transfers data to a special reserved area (spare area). This process is also known as remapping, and "reallocated" sectors are called remaps. This is why, on modern hard disks, "bad blocks" cannot be found while testing the surface – all bad blocks are hidden in reallocated sectors. However, as the number of reallocated sectors increases, the read/write speed tends to decrease. The raw value normally represents a count of the number of bad sectors that have been found and remapped. Thus, the higher the attribute value, the more sectors the drive has had to reallocate.
- Spin Retry Count
Count of retry of spin start attempts. This attribute stores a total count of the spin start attempts to reach the fully operational speed (under the condition that the first attempt was unsuccessful). An increase of this attribute value is a sign of problems in the hard disk mechanical subsystem.
- Command Timeout
A number of aborted operations due to HDD timeout. Normally this attribute value should be equal to zero and if the value is far above zero, then most likely there will be some serious problems with power supply or an oxidized data cable.
- Reallocation Event Count
Count of remap operations. The raw value of this attribute shows the total number of attempts to transfer data from reallocated sectors to a spare area. Both successful & unsuccessful attempts are counted.
- Current Pending Sector Count
Number of "unstable" sectors (waiting to be remapped, because of read errors). If an unstable sector is subsequently written or read successfully, this value is decreased and the sector is not remapped. Read errors on a sector will not remap the sector (since it might be readable later); instead, the drive firmware remembers that the sector needs to be remapped, and remaps it the next time it's written.
- Uncorrectable Sector Count
The total number of uncorrectable errors when reading/writing a sector. A rise in the value of this attribute indicates defects of the disk surface and/or problems in the mechanical subsystem.
lastly, this is a excellent in-depth article on HDD failure:
Minimizing Hard Disk Drive Failure and Data Loss - Wikibooks, collection of open-content textbooks
Hopefully after reading this you would be more aware of reasons to behind hdd failure and be in state to handle to effectively handle data losses. Happy Computing!
PS: It would be great is people can add exact add Threshold values, so we may how what value is bad for which hdd and plan backups accordingly
