If you use any of the OCI compliant runtimes (docker, cri-o, moby etc.) be aware that there’s an issue where btrfs subvolumes are not removed after a container is destroyed.
I wrote about this issue last year on my blog :
https://nanibot.net/posts/docker-and-btrfs-enemies/
Relevant github issues:
opened 01:18PM - 22 Oct 16 UTC
closed 01:43PM - 21 Aug 23 UTC
area/storage/btrfs
status/more-info-needed
version/1.12
**Description**
**Steps to reproduce the issue:**
1. Install docker on a btrf… s system
2. Use docker intensively for a while, and make sure to remove & recreate containers, rebuild with no cache, restart the system, ...
3. Check disk space
**Describe the results you received:**
- /var/lib/docker uses up more than 30GB, with all except a few megabytes located inside /var/lib/docker/btrfs
- `docker ps -s` shows a few gigabtes, below 10GB
- The following do absolutely nothing to change the /var/lib/docker disk space use
1. `docker rmi $(docker images -aq)`
2. `docker volume rm` on every line of `docker volume ls -qf dangling=true`
3. removing all files matching `/var/lib/docker/*/*-json.log`
4. `docker rm` on every stopped container listed in `docker ps -a`
If there is an obvious cleanup procedure that I missed, it's probably not a bug but my own mistake.
I have been known to wonder why disk space runs out so quickly with docker logs growing to gigabytes, without realizing what was going on :smile:
- The following solves the situation (but it shouldn't be required):
1. `<shutdown all services>`
2. `service docker stop`
3. `apt-get remove --purge docker-engine`
4. `rm -rf /var/lib/docker/`
5. `apt-get install -y docker-engine`
6. `service docker start`
7. `<restart all services>`
After this procedure, disk usage of /var/lib/docker goes back to a few single digit gigabytes as expected
**Describe the results you expected:**
`/var/lib/docker` is, after the failing cleanup procedures described above, not significantly larger than what is shown in `docker ps -s`
**Additional information you deem important (e.g. issue happens only occasionally):**
**Output of `docker version`:**
```
root@Ubuntu-1510-wily-64-minimal ~ # docker version
Client:
Version: 1.12.2
API version: 1.24
Go version: go1.6.3
Git commit: bb80604
Built: Tue Oct 11 18:29:41 2016
OS/Arch: linux/amd64
Server:
Version: 1.12.2
API version: 1.24
Go version: go1.6.3
Git commit: bb80604
Built: Tue Oct 11 18:29:41 2016
OS/Arch: linux/amd64
root@Ubuntu-1510-wily-64-minimal ~ #
```
**Output of `docker info`:**
(this is after purging and reinstalling as described above)
```
root@Ubuntu-1510-wily-64-minimal ~ # docker info
Containers: 14
Running: 14
Paused: 0
Stopped: 0
Images: 41
Server Version: 1.12.2
Storage Driver: btrfs
Build Version: Btrfs v4.4
Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null host bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-45-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.859 GiB
Name: Ubuntu-1510-wily-64-minimal
ID: 2ILP:SBMD:2NB7:JY7W:YC3K:CYYV:AX5E:FZLB:CON4:3JCT:3GT4:W4PB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
127.0.0.0/8
root@Ubuntu-1510-wily-64-minimal ~ #
```
**Additional environment details (AWS, VirtualBox, physical, etc.):**
KVM virtual server hosted by hetzner, operating system is Ubuntu 16.04 LTS (the 15.10 occurances above are simply unchanged terminal prompts, the system has been upgraded a while ago)
opened 03:50AM - 07 Jan 15 UTC
closed 09:00PM - 26 Aug 15 UTC
area/storage/btrfs
exp/expert
kind/bug
I receive the following error when deleting a container which created a btrfs su… bvolume (as happens when you run docker in docker).
```
# docker run --rm fedora:20 sh -c 'yum -y -q install btrfs-progs && btrfs subvolume create /test'
Public key for lzo-2.08-1.fc20.x86_64.rpm is not installed
Public key for e2fsprogs-libs-1.42.8-3.fc20.x86_64.rpm is not installed
Importing GPG key 0x246110C1:
Userid : "Fedora (20) <fedora@fedoraproject.org>"
Fingerprint: c7c9 a9c8 9153 f201 83ce 7cba 2eb1 61fa 2461 10c1
Package : fedora-release-20-3.noarch (@fedora-updates/$releasever)
From : /etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-20-x86_64
Create subvolume '//test'
FATA[0033] Error response from daemon: Cannot destroy container c9badf5fc87bb9bfb50a3ee6e5e7c840476bd704e62404c9136aab4d27239d1e: Driver btrfs failed to remove root filesystem c9badf5fc87bb9bfb50a3ee6e5e7c840476bd704e62404c9136aab4d27239d1e: Failed to destroy btrfs snapshot: directory not empty
```
Info:
```
# docker info
docContainers: 22
Images: 47
Storage Driver: btrfs
Execution Driver: native-0.2
Kernel Version: 3.13.2-gentoo
Operating System: Gentoo/Linux
CPUs: 8
Total Memory: 15.64 GiB
Name: whistler
ID: RL3I:O6RS:UJRN:UU74:WAGE:4X5B:T2ZU:ZRSU:BN6Q:WN7L:QTPM:VCLN
Username: phemmer
Registry: [https://index.docker.io/v1/]
WARNING: No swap limit support
# docker version
Client API version: 1.16
Go version (client): go1.3.3
OS/Arch (client): linux/amd64
Server version: 1.4.1
Server API version: 1.16
Go version (server): go1.3.3
Git commit (server): 5bc2ff8
```
booo:
Read that article, seems to me that he made is test to validate his confirmation bias.
Lets see, when you compare RAM and swap; disk access is (say roughly) 500 times slower. so a cache miss and swap-in would be 500 times lower and even swapping out an unused page would create an io which is very very slow.
in older days when you had 640kb RAM then having a swap which is roughly twice the size of RAM would give you an illusion of 3times the memory. but but a very slow memory. and when you wanted huge chunks of data then computer will give you the same error message as when it didn’t have the swap. but in today’s age, we have boat loads of ram and ram is relatively cheap. so just add more RAM which is way way faster than swap disk.
Now, coming to ramdisk based swap; swap in and swap outs will only add one more memory transfer where you move pages from here to there in the memory. how is that supposed to improve performance? I mean swapping is creating unnecessary movement of pages in ram. Actually if you move pages across numa nodes, it will be much more costlier operation than just copying memory in the same numa node. I mean like let the page live where it is instead of moving it in the name of swapping and add more dma calls…
Personally anecdote time… I have done extensive io tests on enterprise servers for a very long time like saturating network links etc in SAN environments. never used swap on my servers/virtual machines. I dont think swap is a good thing in today’s world where you could easily slap a 32G extra ram. if an application using more memory then probably it is a badly written application.
Anyone looked into xfs? its a fairly advanced filesystem too…
+1, using swap on servers is a no-no. When your program runs out of memory it should always be preferred to let it be killed by the OOM killer instead of relying on swap. In addition to what booo mentioned (about disk access being slow), note that because of this the program might take **longer **to fail (typically, you would want it to fail-fast instead). Also note that fedora (not sure about other distros) moved to swap on zram since Fedora 33 I think. (
)
About XFS: I run OpenShift/okd clusters for fun. I moved to ext4 a few weeks ago. Ever since, I’ve noticed that my VMs are struggling with disk IO (which did not happen on XFS). Not jumping to conclusions here but I would like to move back to XFS so that I can rule out any possibility of my hardware dying.
1 Like
can I just use BTRFS now while installing ubuntu 24.04? just got a spare SSD and want to try different distros.
or anything that I need to keep in mind or take precaution of?
No, it just works. No precautions needed. I’m using it since it was experimental.
On the other hand, you might want to change your workflow to take advantage of the new features. E. g. if you copy large files, you could replace cp with “cp --reflink” for copy-on-write.
4 Likes
kiran6680:
No, it just works. No precautions needed. I’m using it since it was experimental.
On the other hand, you might want to change your workflow to take advantage of the new features. E. g. if you copy large files, you could replace cp with “cp --reflink” for copy-on-write.
We use btrfs in enterprise and I approve of this message.
3 Likes