Skip to main content

Check Your Caches Regularly

I ran out of disk space on my work laptop recently. Again. It finally motivated me to find the root cause.

Numbers

Let’s talk numbers. My main disk, a modest 500 GB SSD by today’s standards, can fill up pretty quickly.

$ df -h /
Filesystem                  Size Used Avail Use% Mounted on
/dev/mapper/vgxubuntu-root  467G 467G     0 100% /

A lot of data has crossed this disk since the system installation at the beginning of 2021:

# smartctl -a /dev/nvme0
(...)
Data Units Read:                    12,215,530 [6.25 TB]
Data Units Written:                 25,077,401 [12.8 TB]

I started the cleanup by removing some undoubtedly optional files like ~/Downloads/ and migrating data files to the second drive. I managed to salvage a decent bit:

$ df -h /
/dev/mapper/vgxubuntu-root      467G  421G   23G  95% /

So where is the rest of the junk? I was astonished when I found out that 25% of my disk space was occupied by various caches!

The Root of All Evil

I’ve identified the three most painful storage eaters:

  • Docker
  • Poetry
  • pip

That’s not a huge surprise given that I mainly focus on Python development.

Docker

I use Docker for various stuff, such as running legacy projects that require ancient system deps, quick testing in limited prod-like environments, or just experimenting with unsafe code.

$ docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          44        9         80.36GB   78.29GB (97%)
Containers      11        1         2.618GB   2.618GB (100%)
Local Volumes   47        5         1.219GB   983.4MB (80%)
Build Cache     0         0         0B        0B

You can see the full summary of your docker daemon with docker system df -v. There’s a mostly safe command for removing all stopped containers, unused networks, dangling images, and dangling build cache:

$ docker system prune
(...)
Total reclaimed space: 8.371GB

There’s also a more aggressive option -a, that will also remove all images without at least one associated container and the whole build cache:

$ docker system prune -a
(...)
Total reclaimed space: 73.91GB

I decided to go with -a right now, to have a fresh start. 80 GB reclaimed.

Poetry

In 2023 we started using Poetry to manage our Python environments at work. Check it out, it’s finally production-ready after maturing for a couple of years (the early performance issues are now gone!).

I couldn’t find a built-in command to list Poetry’s cache size. There’s poetry cache list, but it only shows cached package registry names:

$ poetry cache list
PyPI
_default_cache
secret-registry-1
secret-registry-2
(...)

By default Poetry keeps the cache in ~/.cache/pypoetry, but it can be configured with the POETRY_CACHE_DIR env variable:

$ du -sh .cache/pypoetry/* | sort -h
11G	.cache/pypoetry/artifacts  # packages (wheels, tarballs)
12G	.cache/pypoetry/cache  # package metadata used to build `poetry.lock` file
99M	.cache/pypoetry/src  # this one stores source repositories; for example, when deps are specified with a git URI
21G	.cache/pypoetry/virtualenvs  # this one stores actual virtualenvs ;)

You can use poetry cache clear --all <cache name> to delete all cache entries from a given repository. I find it overly complicated to use:

$ poetry cache clear --all _default_cache
Delete 1208 entries? (yes/no) [yes]

There’s an undocumented option that lets you pass . as a cache name to clear all caches:

$ poetry cache clear --all .
Delete 4426 entries? (yes/no) [yes]

Unfortunately, none of those will clean the artifacts directory, so we’re left with:

$ rm -r ~/.cache/pypoetry/artifacts/*

That’s an extra 23 GB.

Legacy pip

I don’t use pip directly anymore since the migration to poetry. Some leftovers were lurking in the shadows:

$ pip cache info
Package index page cache location (pip v23.3+): /home/florczak/.cache/pip/http-v2
Package index page cache location (older pips): /home/florczak/.cache/pip/http
Package index page cache size: 5398.9 MB
Number of HTTP files: 2218
Locally built wheels location: /home/florczak/.cache/pip/wheels
Locally built wheels size: 15.6 MB
Number of locally built wheels: 41

Yay, another 5 GB reclaimed.

Closing Words

The cache is crucial for optimizing performance, but an unmanaged cache leads to resource drainage. When you design an application that heavily relies on cache let your users know the consequences. Create cache invalidation/cleanup policies that make the most sense in your use cases.

I suspect that at least half of the 108 GB cache space I freed was stale and will never be populated again. Only time will tell.