The migration was done. Twelve minutes of downtime, all containers reporting healthy, DNS propagated. I opened the browser to check webmail and got a 504 Gateway Timeout.
Everything looked fine. The container was up. The logs weren't screaming. Traefik was running. And yet: nothing.
This is the story of two completely separate bugs that produced one error message, and the debugging process that found both of them.
The Setup
After migrating 15+ Docker services from a dying laptop to a Hetzner VPS, most things came up clean. Ghost was running. Grafana was up. The portfolio was responding. But mail.cativo.dev — Roundcube webmail sitting in front of docker-mailserver — returned a 504 every time.
The symptoms were confusing:
- DNS was pointing to the right IP
- Every other service was working
docker psshowed webmail as "Up" and "healthy"- No obvious errors anywhere
When a container is healthy but the service is broken, the problem is almost never in the container itself.
Layer 1: Traefik Logs
First stop: the reverse proxy. If Traefik can't route to a container, it'll tell you why.
docker logs traefik --tail 50
Found something weird:
[ERR] Unable to obtain ACME certificate for domains
error="unable to generate a certificate for the domains [mail]:
acme: error: 400 :: Cannot issue for "mail":
Domain name needs at least one dot"
Traefik was trying to get a certificate for mail — not mail.cativo.dev. Just mail. That's not a valid domain. Something was wrong with how Traefik was reading the container's routing rules.
But when I checked the actual compose config:
docker compose config
The label expanded correctly:
labels:
traefik.http.routers.mail.rule: Host(`mail.cativo.dev`)
So the config was right. But Traefik was seeing something different.
Layer 2: What the Container Actually Has
There's a difference between what your compose file says and what labels are actually on the running container:
docker inspect webmail \
--format '{{range $key, $value := .Config.Labels}}{{println $key "=" $value}}{{end}}' \
| grep traefik
traefik.enable = true
traefik.http.routers.mail.rule = Host(`mail.cativo.dev`)
traefik.http.routers.mail.tls.certresolver = letsencryptresolver
The labels were correct. So Traefik had the right routing rule on the container. Why wasn't it working?
Layer 3: The Network Problem
Traefik discovers containers by watching the Docker socket, but it can only route to containers that are on the same Docker network. If they're on different networks, Traefik sees the labels but can't reach the container.
Check what network webmail is on:
docker ps --filter name=webmail --format 'table {{.Names}}\t{{.Networks}}'
NAMES NETWORKS
webmail web
Check what network Traefik is on:
docker inspect traefik \
--format '{{range $key, $value := .NetworkSettings.Networks}}{{println $key}}{{end}}'
space-server_web
There it is. Webmail was on web. Traefik was on space-server_web. Two different networks. They couldn't talk to each other.
Why Did This Happen?
On the old server, the network was just called web. When I migrated to the new server and ran docker compose up from the ~/space-server/ directory, Docker Compose created the network with a project prefix: space-server_web.
The mail-server compose stack had this:
networks:
web:
external: true
This tells Docker: "find an external network called exactly web." That network didn't exist on the new server. Docker created a new one called web instead of connecting to space-server_web. The container started fine — it had a network — but it was the wrong one.
Fix #1: Explicit Network Name
networks:
web:
name: space-server_web
external: true
The name field tells Docker the actual name of the external network to connect to, while still letting you reference it as web internally. Restart the stack:
docker compose down && docker compose up -d
Verify:
docker ps --filter name=webmail --format 'table {{.Names}}\t{{.Networks}}'
NAMES NETWORKS
webmail space-server_web
I opened the browser. The site loaded. And showed me this:
Oops... something went wrong!
An internal error has occurred. Your request cannot be processed at this time.
Progress. A different error is progress.
Layer 4: Container Logs
docker logs webmail --tail 20
ERROR: SQLSTATE[HY000] [14] unable to open database file
ERROR: Failed to connect to database
chown: changing ownership of '/var/roundcube/db': Operation not permitted
Can't open the database file. Can't change ownership. Permissions problem.
What Was Actually Happening
Roundcube uses SQLite and stores the database in a Docker volume. When I restored that volume from the backup tarball, the directory was created with root ownership. Roundcube runs as www-data (UID 33). It tried to fix the permissions on startup, failed because it didn't have the privileges, and then couldn't write to its own database.
The container was healthy because the health check was probably just checking if Apache was running. Apache was running. The database was inaccessible. "Healthy."
Check the volume:
docker exec webmail ls -la /var/roundcube/db/
total 8
drwxr-xr-x 2 root root 4096 Apr 23 22:54 .
drwxr-xr-t 1 www-data www-data 4096 Apr 23 23:02 ..
Directory owned by root. No database file because it couldn't be created.
Fix #2: Fix Permissions from the Host
You can't fix this from inside the container — it doesn't have the privileges. Fix it from the host:
docker volume inspect mail-server_webmail-data --format '{{.Mountpoint}}'
/var/lib/docker/volumes/mail-server_webmail-data/_data
sudo chown -R 33:33 /var/lib/docker/volumes/mail-server_webmail-data/_data
docker restart webmail
Check the logs:
docker logs webmail --tail 10
Running update script at target...
Executing database schema update.
This instance of Roundcube is up-to-date.
Have fun!
"Have fun!" is genuinely one of the better log messages I've seen.
curl -I https://mail.cativo.dev
HTTP/2 200
Webmail is up.
The Debugging Approach That Actually Works
The reason this took two hours instead of four is that I didn't start by randomly restarting containers. I worked from the outside in:
- DNS — Does the domain resolve to the right IP?
- Traefik logs — Is the reverse proxy seeing the container? Can it route?
- Container networks — Are the relevant containers on the same Docker network?
- Container labels — Does the running container have the right Traefik labels?
- Container logs — What is the application actually complaining about?
- Volume permissions — Can the process write to its data directory?
Each layer either rules out a problem or points you to the next one. The network mismatch was invisible until layer 3. The permissions issue was invisible until layer 6. Both containers were "healthy" the entire time.
Useful Commands
# What networks is a container on?
docker inspect <container> \
--format '{{range $key, $value := .NetworkSettings.Networks}}{{println $key}}{{end}}'
# What Traefik labels does a container have?
docker inspect <container> \
--format '{{range $key, $value := .Config.Labels}}{{println $key "=" $value}}{{end}}' \
| grep traefik
# Where does a volume live on disk?
docker volume inspect <volume> --format '{{.Mountpoint}}'
# Check permissions inside a container
docker exec <container> ls -la /path/to/data
Bonus: Hetzner Blocks Port 25 Outbound
After fixing both issues, webmail worked perfectly for receiving email. Then I noticed outgoing mail wasn't going anywhere:
postfix/smtp[6805]: to=<someone@gmail.com>, relay=none,
delay=2282, dsn=4.4.1, status=deferred
(connect to gmail-smtp-in.l.google.com:25: Connection timed out)
Hetzner blocks outbound port 25 on new accounts to prevent spam. Inbound port 25 works fine — I can receive email. But I can't send directly to other mail servers.
Options:
- SMTP relay (fast): Use something like Brevo (300 emails/day free) and route outbound mail through port 587. Five minutes of config.
- Ask Hetzner to unblock it (slow): After 30 days as a customer and paying your first invoice, you can request port 25 be unblocked. Takes days to weeks, approval not guaranteed.
- Wait for the physical server: No cloud provider restrictions on hardware you own.
For now I'm living without outbound email. Receiving works, which covers 90% of what I actually need webmail for.
What This Taught Me
"Healthy" means nothing. Docker health checks verify that a process is running, not that it's working correctly. A container can be healthy and completely broken at the same time. Don't trust the status column.
Docker networks don't survive migrations. When you move compose stacks between hosts, network names change because Docker Compose prefixes them with the project name. Any stack that references an external network by name will silently create a new isolated network instead of connecting to the right one. Always use explicit name: in your network declarations.
Volume permissions are a restore problem, not a container problem. When you restore a Docker volume from a tarball, the ownership reflects whoever created the files — often root. The container expects a specific UID. Check permissions on every volume that a non-root process needs to write to, right after restore, before you declare the migration done.
Two bugs. One error message. Both invisible until you looked at the right layer.