BTC
ETH
SOL
BNB
GOLD
XRP
DOGE
ADA
Back to home
Tech

Running out of Disk Space in Production

A developer launched a simple download server for 2.2GB digital files on a Hetzner box with just 40GB disk and 4GB RAM running NixOS.

A developer launched a simple download server for 2.2GB digital files on a Hetzner box with just 40GB disk and 4GB RAM running NixOS. Minutes after announcing availability, hundreds of customers hit the site. Disk filled to 100% instantly. Email deliveries failed with “452 4.3.1 Insufficient system storage.” Users couldn’t download files. Grafana and df -h confirmed /dev/sda at capacity. This exposed classic production pitfalls: analytics bloat, Nix store explosion, and overlooked log growth under load.

The Barebones Setup

The server ran a Haskell program serving static files behind authorization checks, fronted by nginx as a reverse proxy for a virtual host. Hetzner’s CX11 plan—€3.29/month—delivers that spec: single-core AMD EPYC, 40GB NVMe SSD. Fine for low-traffic prototypes, but tight for bursts. NixOS managed the stack declaratively, including Plausible Analytics for privacy-focused tracking. Plausible relies on ClickHouse, a columnar database that chews disk on queries and logs.

No cloud storage for the 2.2GB files. Everything sat local. Smart for control and cost, but risky. One file equals 5.5% of total disk. Hundreds downloading simultaneously? Cache misses and temp files pile up fast.

Traffic Spike Triggers Cascade

Logs flooded: Mar 31 20:43:03 mogbit kanjideck-fulfillment[2528300]: user error (Unexpected reply to: MAIL "...@kanjideck.com", Expected reply code: 250, Got this instead: 452 "4.3.1 Insufficient system storage"). SMTP rejected outbound mail. Incoming complaints likely dropped too. Disk at 40GB/40GB.

du -sh scans revealed culprits: /var/lib Plausible’s ClickHouse database at 8.5GB, /nix/store at 15GB with generations of configs and binaries. That leaves ~16GB unaccounted—author later realized “rest of files” couldn’t explain it. Likely suspects: journald logs exploding from request volume, nginx access logs, Haskell temp files, or ClickHouse write-ahead logs.

Panic mode. First fix attempt:

nix-collect-garbage -d

to nuke old profiles and store. Failed: error: opening lock file '/nix/var/nix/profiles/system.lock': No space left on device. Nix needs breathing room to run GC—ironic for a reproducibility tool.

Desperate Space Hunt

Quick win:

journalctl --vacuum-time=1s

. systemd’s journal freed enough for Nix GC to proceed, reclaiming 15GB. Next: Shrink ClickHouse. Plausible logs every query to system.query_log. Attempt:

clickhouse-client -q "TRUNCATE TABLE system.query_log"

Failed: Code: 243. DB::Exception: Cannot reserve 1.00 MiB, not enough space. ClickHouse buffers aggressively; even TRUNCATE needs temp space.

Real fix likely involved stopping services, rm -rf large dirs, or clickhouse-client --query="SYSTEM DROP REPLICA ..." on logs. Restart analytics, or offload to external. For files, migrate to S3-compatible like Hetzner Storage Box (€3/month for 100GB) or Backblaze B2 (pennies per GB).

Why This Bites—and How to Avoid

Small instances tempt startups: low cost, quick spin-up. But 40GB caps real-world use at ~20GB safe headroom after OS/Nix overhead. Plausible’s ClickHouse grows 1-5GB/month on modest traffic; spikes multiply it. Nix/store hits 10-20GB easy with iterations. Add logs: game over.

Implications hit hard for indie devs. Downtime erodes trust—KanjiDeck customers waited months for files, then 503s. Revenue risk if paid. Scale math: 100 users x 2.2GB = 220GB theoretical transfer; reality needs bandwidth too (Hetzner CX11 caps 20TB/month outbound).

Fixes: Alert on 80% disk via Prometheus/Grafana (they had Grafana—use it). Separate concerns: static files to CDN/object store. Run analytics on beefier box or hosted (Plausible Cloud starts €5/month). NixOS trick: nix.gc.keep-derivations=false in config, auto-GC more aggressively. Or ditch Nix for Docker if purity isn’t critical.

Skeptical take: Tools like NixOS and Plausible prioritize ideals—reproducibility, privacy—over pragmatism. Great for side projects; production demands ruthless optimization. Monitor everything. Size disks for peaks, not averages. One launch-day disk OOM, and your MVP is DOA.

April 3, 2026 · 3 min · 6 views · Source: Lobsters

Related