To zip, iso or store plain!

nRiTeCh

Level N
I have TBs of data pertaining to many memories of family, tours etc. captured in images and videos and stored across few hard drives.

Since 3 week s I have undertaken the job of complete data consolidation across all hard drives. (sort and categorize data, delete duplicates and dump the needless one)

During this course, I came across few pics which were captured crisp clear yet they now appear corrupted or choppy or completely greyed out from one corners or side.

This is a worry factor as all hard drives are in perfect healthy state with no issues and this is happening.

Now, thinking to either zip/rar the files/folder-wise or create a single iso year-wise.

Looking for suggestions to preserve the data offline for years to come.
 
If you use any of those techniques maybe provide PAR files for recovery? You decide on how much redundancy of the PAR file
 
Videos, and audio don't compress well in any of the those form. And ZIP and ISO are the worst format to store them.

Photos taken in JPEG format do compress a lot in RAR format but even then it varies from picture to picture.

Our digital cameras have advanced by leaps and bounds in the last few years. The photos and videos are exactly what it was like a few years ago. You just have fond memories of them.

Just like in video games, contra and Mario were always pixelated but our childhood memories makes them uber high fidelity ones. But in reality they are exactly as they were.

In my opinion it's better to save them just as is. But if you really want to save space on pictures then you can use WINRAR to make them into RAR files. One RAR file for one year is good enough. And make sure to select "solid archive" , "best compression" and also add recovery health. That was in case your RAR is somehow corrupted , then you can use that recovery file to restore the RAR file.

Another thing which you can do is to save your most critical data onto a Blu Ray disc and save the disc in a proper casing. I know it's not really possible for TBs of data to store in this way but i have some 10-15 year old DVD which still work fine till date. VERBATIM was the brand of DVD. I don't know if they even make Blu ray discs anymore.
 
This is old school stuff that i used to do with CD's and DVD's


Sometimes the disc would develop a problem and after recovering the file it would miss bytes here and there. So the PAR file was like a backup file you made along with the disk and if there was a problem offered a way to recover data in case of corruption.


Should not need it these days but it is an option if you want.


No development since the last 18 years. It works so nothing more required but it means there is no window 64bit client
 
Once you've kept things however you have wanted, it would be a good idea to catalogue them using VVV. We are using this to keep tracking of data in our lab, very handy, esp when each drive is labelled (Both when mounted and externally labelled).

Store your hard drives safely!
 
1674212691469.png

This is how less than 1% pics have gone bad. Unsure if this is corruption or what.

Also, 6yrs ago when primary pic storage hdds partitions vanished all of a sudden, I was able to recover 98% data but some files were zero bytes, some showed proper file size but very small images. Need to understand this behavior as well.
 
View attachment 157981
This is how less than 1% pics have gone bad. Unsure if this is corruption or what.
Yeah, that's corruption. A few bytes got changed. If you prepared a par file with say 10% redundancy you would be able to recover it.
Also, 6yrs ago when primary pic storage hdds partitions vanished all of a sudden, I was able to recover 98% data but some files were zero bytes, some showed proper file size but very small images. Need to understand this behavior as well.
Same thing, the sectors where the files were located on the HDD could not be read so the recovery software you used put 00 in place. Or where larger sections were gone it left them blank which is why the file size is small.

I don't know whether this kind of problem can happen with the present generation HDDs. They swap sectors around if they sense things are going bad.
 
How foolproof and trustworthy is Rar recovery record thing?

If I zipped 1gb worth images in a single rar and if 40% of the rar gets corrupted/damaged, will there be a 100% data recovery?
 
You can simulate it in a hex editor with a smaller amount.

Rar up some small file, open in hex editor, replace x% bytes with zero here and there and see if it works
 
How foolproof and trustworthy is Rar recovery record thing?

If I zipped 1gb worth images in a single rar and if 40% of the rar gets corrupted/damaged, will there be a 100% data recovery?
If 40% of rar is corrupted then you'll lose 40% of the data. Provided header of the file is fine.

Zipping photos and videos doesn't make any sense. You are adding headache without any possible plus sides. It has zero use for the data protection. If the header of the zip file gets corrupted then you will lose 100% of the data even if rest 99% is fine.

Compress your videos and photos using media compressors like irfanview and handbrake. There's no reason an old photo should be 5mb. Compress that photo to 300kb. And put your multiple hdds into some RAID to protect your data.
 
If 40% of rar is corrupted then you'll lose 40% of the data. Provided header of the file is fine.

Zipping photos and videos doesn't make any sense. You are adding headache without any possible plus sides. It has zero use for the data protection. If the header of the zip file gets corrupted then you will lose 100% of the data even if rest 99% is fine.
Agree but if you add parity info you can recover that much in corrupted data.
Compress your videos and photos using media compressors like irfanview and handbrake. There's no reason an old photo should be 5mb. Compress that photo to 300kb.
Disagree. I's always better to preserve the data of old photos in their entirety instead of throwing away data. If you ever need to photoshop them in the future, you will be glad you did.
And put your multiple hdds into some RAID to protect your data.
fine
 
Agree but if you add parity info you can recover that much in corrupted data.
I dunno how parity info is saved with zipped files but I don't think it can recover 40% of data. Few percentage maybe at best. To create parity data of zip file which can recover from 40% data loss, would inflate zip file much above the original file size.

Disagree. I's always better to preserve the data of old photos in their entirety instead of throwing away data. If you ever need to photoshop them in the future, you will be glad you did.
Old photos taken with old devices had weak processors. They lacked the processing power to compress the photos. That's why you get big file sizes. Today on computers you can compress photos to just 10% without losing any visual details.
 
I dunno how parity info is saved with zipped files but I don't think it can recover 40% of data. Few percentage maybe at best. To create parity data of zip file which can recover from 40% data loss, would inflate zip file much above the original file size.
40% is a lot, you can use a lower ratio. And that is recovery from any random loss that does not exceed that amount.
Old photos taken with old devices had weak processors. They lacked the processing power to compress the photos. That's why you get big file sizes. Today on computers you can compress photos to just 10% without losing any visual details.
This is more true with video than images I think.
 
Does raid really protect data? I've always heard raid is not backup.

You heard correct. RAID helps you create a resilient system in case of disk failure. But that's not good enough if you want proper redundancy. You need to setup a separate storage to keep another copy of your data.
 
I was always against raid as always required a sure-shot backup solution and some known experts who ever relied on raid truly as a backup ended up trying to rebuild their disks with little success coupled with horrifying nights and no straight forward solution. (data was lost)
Old photos taken with old devices had weak processors. They lacked the processing power to compress the photos. That's why you get big file sizes. Today on computers you can compress photos to just 10% without losing any visual details.
Not true. I have been working with images since very long time, a compression itself means 1% compromise on quality and more the C more the later C!
Once you compress there's no straight forward aka simple way to retrieve the original form unless some complex post-process for hd/4k thing which again contributes hugely on the in eased file size!
 
Okay! I will try to put my expert hat on…
What you described is called as “silent data corruption” (google it.)
Now coming to your question about whether to zip it or raid it other anything else…
There are two approaches to safeguarding your data. One is to checksum it and other is to recover it when something goes wrong.
Take an example of zipping… zip along with compression also has checksums so you will know if the data is good or bad. But is compute intensive ( takes hours and burns coal hence, money )

Other is raid, where your data is sharded and put into multiple drives and then checksummed so you can retrieve upto on damaged drive (20%) and rebuilding raid is also time consuming and will not guarantee if more than one drive fails. (Lets talk about raid 60 and erasure coding some other day)

For data replication you not only want to safeguard the data but also want to recover it and journal it ( go back in time, like a Time Machine) for this you would need a little bit more advanced system.

Because i am an expert :p I am also too lazy, also the storage is getting cheaper by the day. Instead of spending money on electricity for running complex algorithms for checksums etc, i simply buy an extra big hdd and make one more extra copy. All my photos are stored in multiple external hdds and thumb drives strategically thrown around the house.

Being all that said, if you insist on raid like system, i would use minio and erasure coded setup to store data on a xfs/zfs/btrfs(butterfs) like file system which allows cow snapshots at regular time intervals on a home kubernetes cluster built using multiple raspberry pis which provides s3 storage and exported through router configured to dyndns on google domains exposed on my personal domain.
 
Back
Top