Migrating 15 Docker Services in 12 Minutes of Downtime (And What Broke Anyway)

My server was dying. Not metaphorically — Ghost was crashing every two to three hours because the OOM killer was going through my processes like a bouncer clearing a bar at closing time. The laptop I'd been running as a home server had 4GB of RAM, 3.4GB actually available to the OS, and every single gigabyte was spoken for.

$ free -h
              total        used        free      shared  buff/cache   available
Mem:          3.4Gi       3.2Gi       100Mi        50Mi       200Mi        50Mi
Swap:         2.0Gi       1.9Gi       100Mi

Swap at 95%. Ghost down again. Me, at 11pm, restarting containers for the third time that week.

The fix was obvious: more RAM. The question was how to get there without losing everything.

What I Was Running

Fifteen-plus Docker containers across several compose stacks:

Ghost + MySQL (the blog you're reading)
Portfolio — Nuxt 4 frontend
Portfolio API — NestJS + MySQL
Mail stack — docker-mailserver + Roundcube webmail
space-server stack — Traefik v3.6, Docker Socket Proxy, Prometheus, Grafana, Dozzle, Uptime Kuma, whoami

All of this on a repurposed laptop sitting in my apartment. No redundancy. No UPS. Just vibes and swap space.

The New Server

I went with a Hetzner Cloud VPS — 8GB RAM, Intel Xeon (Skylake), 80GB SSD, Ubuntu 24.04. This is a temporary stop while I figure out what physical server I actually want to buy. But "temporary" still means I needed to migrate everything cleanly, because I'm not rebuilding from scratch twice.

The upside of doing this migration now: I'd have a tested, automated process ready for when I do the final move to physical hardware.

The Plan: Six Scripts

I could have done this manually. SSH in, stop things, rsync, start things, pray. But I've done enough cowboy migrations to know that "I'll remember all the steps" is a lie you tell yourself at 2pm that haunts you at 2am.

So I wrote six bash scripts:

scripts/
├── 1-backup-cativo.sh          # Full backup of old server
├── 2-download-backup.sh        # Pull it down locally
├── 3-verify-backup.sh          # Check integrity before touching anything
├── 4-setup-polaris2.sh         # Provision the new server from scratch
├── 5-cutover.sh                # The actual migration (this is where downtime happens)
└── 6-verify-migration.sh       # Confirm everything works

Four hours to write them. Thirty minutes to run them. Worth it.

Script 1: Backup Everything

#!/bin/bash
for volume in $(docker volume ls -q); do
    docker run --rm \
        -v $volume:/data \
        -v /tmp/backups:/backup \
        alpine tar czf /backup/${volume}.tar.gz -C /data .
done

tar czf /tmp/migration-backup.tar.gz \
    ~/space-server/docker-compose.yml \
    ~/space-server/.env \
    ~/space-server/traefik/ \
    ~/space-server/mail-server/ \
    /tmp/backups/*.tar.gz

One tarball with everything: volumes, configs, SSL certs, .env files. 13MB total.

Script 3: Verify Before You Proceed

This one saved me. Before touching the new server, I verified the backup had what it needed:

test -f acme.json || echo "ERROR: SSL certs missing"
test -f docker-compose.yml || echo "ERROR: Compose file missing"
test -f .env || echo "ERROR: Env vars missing"

Turns out one .env file wasn't in the backup. Found it here, not at 2am after the cutover.

Script 4: Provision the New Server

#!/bin/bash
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER

sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

sudo ufw allow 22/tcp && sudo ufw allow 80/tcp && sudo ufw allow 443/tcp
sudo ufw allow 25/tcp && sudo ufw allow 465/tcp && sudo ufw allow 587/tcp
sudo ufw allow 993/tcp && sudo ufw allow 143/tcp
sudo ufw --force enable

ssh polaris2 "mkdir -p ~/space-server/traefik/letsencrypt"
scp ~/backups/.../acme.json polaris2:~/space-server/traefik/letsencrypt/
ssh polaris2 "chmod 600 ~/space-server/traefik/letsencrypt/acme.json"
ssh polaris2 "docker network create space-server_web"

for volume_file in ~/backups/.../volumes/*.tar.gz; do
    volume=$(basename "$volume_file" .tar.gz)
    ssh polaris2 "docker volume create $volume"
    scp "$volume_file" polaris2:/tmp/
    ssh polaris2 "docker run --rm -v $volume:/data -v /tmp:/backup alpine tar xzf /backup/$(basename $volume_file) -C /data"
done

After this runs, polaris2 has all the data. Services aren't started yet — that's intentional.

Script 5: The Cutover

#!/bin/bash
echo "WARNING: This will cause downtime!"
read -p "Are you ready? (yes/no): " confirm
[ "$confirm" != "yes" ] && exit 0

ssh cativo.dev "cd ~/space-server && docker compose down"
ssh cativo.dev "cd ~/space-server/mail-server && docker compose down"

ssh cativo.dev 'bash -s' < scripts/1-backup-cativo.sh
bash scripts/2-download-backup.sh

ssh polaris2 "cd ~/space-server && docker compose up -d"
ssh polaris2 "cd ~/space-server/mail-server && docker compose up -d"

echo "CRITICAL: Update DNS records NOW to 167.235.52.161"

That final incremental backup matters more than it sounds. It caught blog comments, emails, and Prometheus metrics that came in during the two hours I was running the prep scripts. Without it, that data would be gone.

The Timeline

T-24h: Dropped DNS TTL to 300 seconds. This is the move that makes everything else faster.
T-30min: Ran backup script. 13MB, all 15 volumes, done.
T-20min: Downloaded and verified. Found the missing .env here, not later.
T-10min: Provisioned polaris2. Docker, swap, firewall, volumes — all automated.
T-0: Ran cutover. Typed "yes". Held my breath.
T+5min: Updated DNS records to 167.235.52.161.
T+15min: Ran verification script. Everything green.

Total downtime: 12 minutes.

What Broke Anyway

The verification script said success. The containers were all "Up" and "healthy". And then I tried to open webmail.

504 Gateway Timeout.

Three problems surfaced in the two hours after migration:

1. Webmail Gateway Timeout — Docker network mismatch. The webmail container was on a network called web, but Traefik was on space-server_web. Docker Compose adds a project prefix based on the directory name. They couldn't see each other. The containers were healthy; they just weren't on speaking terms.

2. Database Permission Error — Volume permissions from the restore. The SQLite directory for Roundcube was owned by root, but the container runs as www-data (UID 33). It couldn't write to its own database.

3. Missing .env Files — One service had its env vars in a file that wasn't included in the backup path. The verify script caught the main one, but a secondary stack had its own.

All three fixed within two hours. The webmail one took the longest because it looked like a networking problem but was actually two separate problems stacked on top of each other.

Before and After

Before:

RAM: 3.4GB total — Used: 3.2GB (94%) — Swap: 1.9GB/2GB (95%)
Ghost: crashing every 2-3 hours

After:

RAM: 7.6GB total — Used: 2.8GB (37%) — Swap: 200MB/4GB (5%)
Ghost: hasn't crashed once

What I Actually Learned

Automate the whole thing, not just the hard parts. The scripts took four hours to write. The migration took thirty minutes. If I'd done it manually, the migration alone would have taken two hours and I'd have made at least one mistake I wouldn't find until the next day.

The incremental backup at cutover time is not optional. You spend hours prepping the new server while the old one keeps running. Prometheus keeps scraping metrics. Ghost keeps writing to its database. If you don't do a final sync right before the cutover, you lose whatever changed during prep — and yes, even if nobody reads your blog, Ghost still writes to its database.

Docker networks don't travel between hosts. When you run docker compose up in a directory called space-server, Docker creates networks prefixed with space-server_. If your other compose stacks reference a network called web as external, they'll fail silently — the containers start, they report healthy, but they're on an island. Use explicit names:

networks:
  web:
    name: space-server_web
    external: true

Volume permissions will bite you. When you restore a Docker volume from a tarball, the ownership comes from whoever created the files in the backup. That might be root. Your container might expect UID 33. Check permissions on every critical volume after restore:

docker exec <container> ls -la /path/to/volume
sudo chown -R <uid>:<gid> /var/lib/docker/volumes/<volume>/_data

Low DNS TTL before migration is not optional. I dropped to 300 seconds 24 hours before. During the cutover, DNS propagated in under five minutes. If I'd left it at the default 3600 seconds, some users would have been hitting the dead server for an hour.

Keep the old server running for a week. I didn't shut down the laptop. No services running, but the data was still there. If something had gone wrong that I didn't catch in the first 48 hours, I had a rollback path. Nothing went wrong, but knowing it was there made the whole thing less stressful.

This migration is temporary — I'm still planning to buy a physical server. But now I have a tested process, six scripts I can run again, and a much clearer picture of what breaks during a Docker migration. The next one will be faster.