Linux Has anyone used ZFS? I'm curious what benefits you got

kiran6680

Recruit
I have used both on Linux - ZFS earlier when it had to be compiled for Linux. I use Btrfs regularly, for about 9 years. Most of my systems have more RAM than necessary, so I cannot comment on real RAM requirements, but ZFS recommends high RAM for optimal performance - if I remember correctly it used to be a GB RAM per 1-2 TB of storage.

Firstly, both have the advantage of checksum , snapshots, multiple devices for a single filesystem, and copy-on-write.

1. Snapshots : I take all backups of a read-only snapshot. This way, backups are always consistent. Also, a snapshot itself is a poor man's backup, but it doesn't protect against hard disk failure. I tried very few snapshots with ZFS, but with Btrfs I have daily snapshots enabled and my script to delete old snapshots is not too reliable :). Still I don't see any performance degradation with hundreds of snapshots. My disk usage is less than 70% for that filesystem, though.

2. Copy-on-write : this is great. Especially large videos to edit / large virtual machine images to boot from. The below command on Btrfs (with ZFS on linux we need to clone a dataset, but the idea is the same)

Bash:
cp --reflink=always file1 copy1

takes negligible extra space and finishes immediately, but creates an independent copy in file "copy1". Now small edits to copy1 will take very little space.

3. Multiple devices for a single filesystem : I have not made advanced use of this, but I did combine a partition each from 2 different hard drives to make a single Btrfs filesystem.

4. Checksum is an internal feature, giving us the peace of mind that data won't be corrupted by certain hardware errors.

Btrfs did screw up my filesystem once initially about 9 years ago, but it was partially my fault. I opened same virtual hard disk in 2 virtual machines simultaneously. I could not recover the data after trying very hard, some btrfs-progs tools were not available those days. Even fsck was new, if I recall correctly.

ZFS is difficult to manage on Linux because it needs to be installed separately after installing Linux, so cannot easily make it your one and only system drive, but this is not a problem on BSD.
 

Party Monger

Skilled
Thanks for the detailed explanation. I have been researching Unraid, proxmox etc and has been ignoring this part. What do you think is the best for a proxmox server where I will ne backing up the VM itself everyday.

Also what's the effect of these fancy file systems on disk usage since ssds have a write limit. I had read that just in proxmox the write amplification can be 3-80times more than actual write in the VM. Is there any ram cache etc that can be programmed for these..
 

deusExMachina

Disciple
1. Snapshots : I take all backups of a read-only snapshot. This way, backups are always consistent. Also, a snapshot itself is a poor man's backup, but it doesn't protect against hard disk failure. I tried very few snapshots with ZFS, but with Btrfs I have daily snapshots enabled and my script to delete old snapshots is not too reliable :). Still I don't see any performance degradation with hundreds of snapshots. My disk usage is less than 70% for that filesystem, though.
Why do you take a snapshot though instead of just an incremental backup? seems like the same thing while not depending on a filesystem.

2. Copy-on-write : this is great. Especially large videos to edit / large virtual machine images to boot from. The below command on Btrfs (with ZFS on linux we need to clone a dataset, but the idea is the same)

Bash:
cp --reflink=always file1 copy1

takes negligible extra space and finishes immediately, but creates an independent copy in file "copy1". Now small edits to copy1 will take very little space.
I can see how this would be useful! So, you're saying that the reflink option only works in Btrfs?
Also, what does a dataset clone do on ZFS? do you need extra space/disk to make the clone? It's somethign I've never tried.
3. Multiple devices for a single filesystem : I have not made advanced use of this, but I did combine a partition each from 2 different hard drives to make a single Btrfs filesystem.
Never done this but I can see how this could be useful :)
One thing I liked about ZFS is that the actual disk space seems to be shared between partitions in the dataset rather be prealloted like in a regular setup.
4. Checksum is an internal feature, giving us the peace of mind that data won't be corrupted by certain hardware errors.

Btrfs did screw up my filesystem once initially about 9 years ago, but it was partially my fault. I opened same virtual hard disk in 2 virtual machines simultaneously. I could not recover the data after trying very hard, some btrfs-progs tools were not available those days. Even fsck was new, if I recall correctly.

ZFS is difficult to manage on Linux because it needs to be installed separately after installing Linux, so cannot easily make it your one and only system drive, but this is not a problem on BSD.
Do you know if ZFS auto checks the checksum? or would it detect issues only after a scrub? I personally liked how FreeBSD was put together but man they seem to have issues with hibernate and can't run a bunch of tools (that work on Linux).
 

kiran6680

Recruit
Why do you take a snapshot though instead of just an incremental backup? seems like the same thing while not depending on a filesystem.
I do incremental backup of snapshots. This is very important for consistent backups, which are necessary to avoid trouble after restoring. Taking an example that is simple and manual, but in reality it may be complex and files automatically created through programs :
1. On 29 Sep 2021 5 pm, file /home/deus/.bashrc contains :
Code:
#!/bin/bash
/home/deus/liftTheGodInCrane.sh

And the file /home/deus/liftTheGodInCrane.sh contains
Code:
GODCOLOR=brown

2. On 29 Sep 2021 6 pm, backup starts. But file system is huge, so backup takes 30 minutes to complete.

3. On 29 Sep 2021 6:05 pm, file /home/deus/.bashrc changed to contain :
Code:
#!/bin/bash
/home/deus/liftTheGodInCraneNew.sh

The file /home/deus/liftTheGodInCrane.sh doesn't exist now. And the file /home/deus/liftTheGodInCraneNew.sh contains
Code:
GODCOLOR=white

because now you want the God to be white.

4. Due to backup being taken of different files at different times, the backup of .bashrc was created at 6:03 pm. And the backup of liftTheGodInCraneNew.sh was created at 6:15 pm. So the backup would be like :

file /home/deus/.bashrc contains :
Code:
#!/bin/bash
/home/deus/liftTheGodInCrane.sh

The file /home/deus/liftTheGodInCrane.sh doesn't exist in backup
And the file /home/deus/liftTheGodInCraneNew.sh contains
Code:
GODCOLOR=white

5. So if you restore this backup, instead of a white or brown god being lifted in a crane, you get an error that the file /home/deus/liftTheGodInCrane.sh does not exist.

Instead, if you took a snapshot of the filesystem and backup the snapshot, you get perfect consistent backup.


I can see how this would be useful! So, you're saying that the reflink option only works in Btrfs?
Also, what does a dataset clone do on ZFS? do you need extra space/disk to make the clone? It's somethign I've never tried.
No extra space, because both ZFS and Btrfs are copy on write filesystems. But more complicated to use on ZFS Linux than simply cp --reflink of Btrfs.
Never done this but I can see how this could be useful :)
One thing I liked about ZFS is that the actual disk space seems to be shared between partitions in the dataset rather be prealloted like in a regular setup.
Yes, that is nice, though this is achieved with LVM also.
Do you know if ZFS auto checks the checksum? or would it detect issues only after a scrub? I personally liked how FreeBSD was put together but man they seem to have issues with hibernate and can't run a bunch of tools (that work on Linux).
There are different kinds of checks at different times. I have forgotten some details. But it does store checksum. There is error correction in some circumstances (I think when there is replicating RAIDZ like 1, 5, 6 or 10) and only error detection in absence of RAIDZ, or RAIDZ 0.
 
Last edited:

booo

BA BA BA BABANANA
Skilled
Yep used both. But not in production environments.
The cool thing for me is that if you create a volume with a mix of ssds and spinning disks. They will automatically use the ssds for caching.
 

kiran6680

Recruit
Thanks for the detailed explanation. I have been researching Unraid, proxmox etc and has been ignoring this part. What do you think is the best for a proxmox server where I will ne backing up the VM itself everyday.

Also what's the effect of these fancy file systems on disk usage since ssds have a write limit. I had read that just in proxmox the write amplification can be 3-80times more than actual write in the VM. Is there any ram cache etc that can be programmed for these..
I don't know about unraid or proxmox, seem to be virtualisation plateform. If so, try taking consistent backup rather than just copying VM image : copy is not backup. E.g. virtualbox and VMware support snapshots of VMs.

I think extra write is negligible due to these file systems. Zfs has a background prices process for checking, which has some basic write. Btrfs doesn't even have that much. Around 2012 or so, all distros did not support "discard" functionality for btrfs on SSDs, which was reducing performance after a few write cycles, but that problem has also been solved long ago.
 

deusExMachina

Disciple
Instead, if you took a snapshot of the filesystem and backup the snapshot, you get perfect consistent backup.
Thanks, you made it perfectly clear! the example was on point
No extra space, because both ZFS and Btrfs are copy on write filesystems. But more complicated to use on ZFS Linux than simply cp --reflink of Btrfs.
Are there any other advantages of a CoW filesystem? Also, doesn't journaling help avoid data loss during abrupt shutdowns? how does btrfs / ZFS handle this?
Yes, that is nice, though this is achieved with LVM also.
Really? the last time I experimented with LVM, the space usage / reporting was similar to traditional partitions.

I mean say you split a 1TB disk into four partitions and mounted say /home /usr /tmp /var/logs. df would show % usage with respect to the corresponding partition size. On LVM also, I saw that it was reporting as a % of the respective volume size.

On ZFS on the other hand, all of them would report freespace as a % with respect the the entire dataset size (1TB). This seemed neat to me because I wouldn't have to over allot space.
 

kiran6680

Recruit
Thanks, you made it perfectly clear! the example was on point

Are there any other advantages of a CoW filesystem? Also, doesn't journaling help avoid data loss during abrupt shutdowns? how does btrfs / ZFS handle this?
Btrfs and ZFS also do journaling and abrupt shutdowns have not been a data loss problem for ext3, JFS, ext4 and XFS also. Btrfs and ZFS protect from other problems - e.g. hard disk hardware problem when it writes 1 and reads 0. I've personally faced a problem with defective hynix RAM in 2007 which was silently corrupting my data. Btrfs and ZFS, due to checksums while saving and reading back, save us from those errors corrupting our data. Traditional journaling file systems don't do that.

On the same lines as my earlier example of video editing / VM images, CoW systems allow for :
1. If you do "yum upgrade", or "apt-get upgrade" or "pacman -Syu" and if it screws up your system, with CoW you can rollback much more cleanly than allowed by yum, apt-get and pacman. This is somewhat automated by tools like Snapper (https://wiki.archlinux.org/index.php/Snapper).

2. Backup restore : Linux mint has an out of box feature of the option to snapshot daily (or any other schedule). There is an easy restore to any other day's snapshot. These work only for Btrfs.

There are lots of applications of CoW, but since basic idea is the same I am not sure I can call it "other" advantages.
Really? the last time I experimented with LVM, the space usage / reporting was similar to traditional partitions.

I mean say you split a 1TB disk into four partitions and mounted say /home /usr /tmp /var/logs. df would show % usage with respect to the corresponding partition size. On LVM also, I saw that it was reporting as a % of the respective volume size.

On ZFS on the other hand, all of them would report freespace as a % with respect the the entire dataset size (1TB). This seemed neat to me because I wouldn't have to over allot space.
Yes, you are right, I exaggerated LVM's functionality quite a bit :) . LVM does a very small part of what ZFS does in this case. LVM just lets us freely resize the "partitions" (volumes). But the filesystems contained in those volumes still pose problems.
 
Top