Check Your Caches Regularly
I ran out of disk space on my work laptop recently. Again. It finally motivated me to find the root cause.
Numbers
Let’s talk numbers. My main disk, a modest 500 GB SSD by today’s standards, can fill up pretty quickly.
$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vgxubuntu-root 467G 467G 0 100% /
A lot of data has crossed this disk since the system installation at the beginning of 2021:
# smartctl -a /dev/nvme0
(...)
Data Units Read: 12,215,530 [6.25 TB]
Data Units Written: 25,077,401 [12.8 TB]
I started the cleanup by removing some undoubtedly optional files like ~/Downloads/
and migrating data files to the second drive.
I managed to salvage a decent bit:
$ df -h /
/dev/mapper/vgxubuntu-root 467G 421G 23G 95% /
So where is the rest of the junk? I was astonished when I found out that 25% of my disk space was occupied by various caches!
The Root of All Evil
I’ve identified the three most painful storage eaters:
- Docker
- Poetry
- pip
That’s not a huge surprise given that I mainly focus on Python development.
Docker
I use Docker for various stuff, such as running legacy projects that require ancient system deps, quick testing in limited prod-like environments, or just experimenting with unsafe code.
$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 44 9 80.36GB 78.29GB (97%)
Containers 11 1 2.618GB 2.618GB (100%)
Local Volumes 47 5 1.219GB 983.4MB (80%)
Build Cache 0 0 0B 0B
You can see the full summary of your docker daemon with docker system df -v
.
There’s a mostly safe command for removing all stopped containers, unused networks, dangling images, and dangling build cache:
$ docker system prune
(...)
Total reclaimed space: 8.371GB
There’s also a more aggressive option -a
, that will also remove all images without at least one associated container and the whole build cache:
$ docker system prune -a
(...)
Total reclaimed space: 73.91GB
I decided to go with -a
right now, to have a fresh start.
80 GB reclaimed.
Poetry
In 2023 we started using Poetry
to manage our Python environments at work.
Check it out, it’s finally production-ready after maturing for a couple of years (the early performance issues are now gone!).
I couldn’t find a built-in command to list Poetry’s cache size.
There’s poetry cache list
, but it only shows cached package registry names:
$ poetry cache list
PyPI
_default_cache
secret-registry-1
secret-registry-2
(...)
By default Poetry keeps the cache in ~/.cache/pypoetry
, but it can be configured with the POETRY_CACHE_DIR
env variable:
$ du -sh .cache/pypoetry/* | sort -h
11G .cache/pypoetry/artifacts # packages (wheels, tarballs)
12G .cache/pypoetry/cache # package metadata used to build `poetry.lock` file
99M .cache/pypoetry/src # this one stores source repositories; for example, when deps are specified with a git URI
21G .cache/pypoetry/virtualenvs # this one stores actual virtualenvs ;)
You can use poetry cache clear --all <cache name>
to delete all cache entries from a given repository.
I find it overly complicated to use:
$ poetry cache clear --all _default_cache
Delete 1208 entries? (yes/no) [yes]
There’s an undocumented option that lets you pass .
as a cache name to clear all caches:
$ poetry cache clear --all .
Delete 4426 entries? (yes/no) [yes]
Unfortunately, none of those will clean the artifacts
directory, so we’re left with:
$ rm -r ~/.cache/pypoetry/artifacts/*
That’s an extra 23 GB.
Legacy pip
I don’t use pip
directly anymore since the migration to poetry.
Some leftovers were lurking in the shadows:
$ pip cache info
Package index page cache location (pip v23.3+): /home/florczak/.cache/pip/http-v2
Package index page cache location (older pips): /home/florczak/.cache/pip/http
Package index page cache size: 5398.9 MB
Number of HTTP files: 2218
Locally built wheels location: /home/florczak/.cache/pip/wheels
Locally built wheels size: 15.6 MB
Number of locally built wheels: 41
Yay, another 5 GB reclaimed.
Closing Words
The cache is crucial for optimizing performance, but an unmanaged cache leads to resource drainage. When you design an application that heavily relies on cache let your users know the consequences. Create cache invalidation/cleanup policies that make the most sense in your use cases.
I suspect that at least half of the 108 GB cache space I freed was stale and will never be populated again. Only time will tell.