504 Gateway Timeout: Two Hidden Bugs That Looked Like One

The migration was done. Twelve minutes of downtime, all containers reporting healthy, DNS propagated. I opened the browser to check webmail and got a 504 Gateway Timeout.

Everything looked fine. The container was up. The logs weren't screaming. Traefik was running. And yet: nothing.

This is the story of two completely separate bugs that produced one error message, and the debugging process that found both of them.

The Setup

After migrating 15+ Docker services from a dying laptop to a Hetzner VPS, most things came up clean. Ghost was running. Grafana was up. The portfolio was responding. But mail.cativo.dev — Roundcube webmail sitting in front of docker-mailserver — returned a 504 every time.

The symptoms were confusing:

DNS was pointing to the right IP
Every other service was working
docker ps showed webmail as "Up" and "healthy"
No obvious errors anywhere

When a container is healthy but the service is broken, the problem is almost never in the container itself.

Layer 1: Traefik Logs

First stop: the reverse proxy. If Traefik can't route to a container, it'll tell you why.

docker logs traefik --tail 50

Found something weird:

[ERR] Unable to obtain ACME certificate for domains
error="unable to generate a certificate for the domains [mail]: 
acme: error: 400 :: Cannot issue for "mail": 
Domain name needs at least one dot"

Traefik was trying to get a certificate for mail — not mail.cativo.dev. Just mail. That's not a valid domain. Something was wrong with how Traefik was reading the container's routing rules.

But when I checked the actual compose config:

docker compose config

The label expanded correctly:

labels:
  traefik.http.routers.mail.rule: Host(`mail.cativo.dev`)

So the config was right. But Traefik was seeing something different.

Layer 2: What the Container Actually Has

There's a difference between what your compose file says and what labels are actually on the running container:

docker inspect webmail \
  --format '{{range $key, $value := .Config.Labels}}{{println $key "=" $value}}{{end}}' \
  | grep traefik

traefik.enable = true
traefik.http.routers.mail.rule = Host(`mail.cativo.dev`)
traefik.http.routers.mail.tls.certresolver = letsencryptresolver

The labels were correct. So Traefik had the right routing rule on the container. Why wasn't it working?

Layer 3: The Network Problem

Traefik discovers containers by watching the Docker socket, but it can only route to containers that are on the same Docker network. If they're on different networks, Traefik sees the labels but can't reach the container.

Check what network webmail is on:

docker ps --filter name=webmail --format 'table {{.Names}}\t{{.Networks}}'

NAMES     NETWORKS
webmail   web

Check what network Traefik is on:

docker inspect traefik \
  --format '{{range $key, $value := .NetworkSettings.Networks}}{{println $key}}{{end}}'

space-server_web

There it is. Webmail was on web. Traefik was on space-server_web. Two different networks. They couldn't talk to each other.

Why Did This Happen?

On the old server, the network was just called web. When I migrated to the new server and ran docker compose up from the ~/space-server/ directory, Docker Compose created the network with a project prefix: space-server_web.

The mail-server compose stack had this:

networks:
  web:
    external: true

This tells Docker: "find an external network called exactly web." That network didn't exist on the new server. Docker created a new one called web instead of connecting to space-server_web. The container started fine — it had a network — but it was the wrong one.

Fix #1: Explicit Network Name

networks:
  web:
    name: space-server_web
    external: true

The name field tells Docker the actual name of the external network to connect to, while still letting you reference it as web internally. Restart the stack:

docker compose down && docker compose up -d

Verify:

docker ps --filter name=webmail --format 'table {{.Names}}\t{{.Networks}}'

NAMES     NETWORKS
webmail   space-server_web

I opened the browser. The site loaded. And showed me this:

Oops... something went wrong!
An internal error has occurred. Your request cannot be processed at this time.

Progress. A different error is progress.

Layer 4: Container Logs

docker logs webmail --tail 20

ERROR: SQLSTATE[HY000] [14] unable to open database file
ERROR: Failed to connect to database
chown: changing ownership of '/var/roundcube/db': Operation not permitted

Can't open the database file. Can't change ownership. Permissions problem.

What Was Actually Happening

Roundcube uses SQLite and stores the database in a Docker volume. When I restored that volume from the backup tarball, the directory was created with root ownership. Roundcube runs as www-data (UID 33). It tried to fix the permissions on startup, failed because it didn't have the privileges, and then couldn't write to its own database.

The container was healthy because the health check was probably just checking if Apache was running. Apache was running. The database was inaccessible. "Healthy."

Check the volume:

docker exec webmail ls -la /var/roundcube/db/

total 8
drwxr-xr-x 2 root     root     4096 Apr 23 22:54 .
drwxr-xr-t 1 www-data www-data 4096 Apr 23 23:02 ..

Directory owned by root. No database file because it couldn't be created.

Fix #2: Fix Permissions from the Host

You can't fix this from inside the container — it doesn't have the privileges. Fix it from the host:

docker volume inspect mail-server_webmail-data --format '{{.Mountpoint}}'

/var/lib/docker/volumes/mail-server_webmail-data/_data

sudo chown -R 33:33 /var/lib/docker/volumes/mail-server_webmail-data/_data
docker restart webmail

Check the logs:

docker logs webmail --tail 10

Running update script at target...
Executing database schema update.
This instance of Roundcube is up-to-date.
Have fun!

"Have fun!" is genuinely one of the better log messages I've seen.

curl -I https://mail.cativo.dev

HTTP/2 200

Webmail is up.

The Debugging Approach That Actually Works

The reason this took two hours instead of four is that I didn't start by randomly restarting containers. I worked from the outside in:

DNS — Does the domain resolve to the right IP?
Traefik logs — Is the reverse proxy seeing the container? Can it route?
Container networks — Are the relevant containers on the same Docker network?
Container labels — Does the running container have the right Traefik labels?
Container logs — What is the application actually complaining about?
Volume permissions — Can the process write to its data directory?

Each layer either rules out a problem or points you to the next one. The network mismatch was invisible until layer 3. The permissions issue was invisible until layer 6. Both containers were "healthy" the entire time.

Useful Commands

# What networks is a container on?
docker inspect <container> \
  --format '{{range $key, $value := .NetworkSettings.Networks}}{{println $key}}{{end}}'

# What Traefik labels does a container have?
docker inspect <container> \
  --format '{{range $key, $value := .Config.Labels}}{{println $key "=" $value}}{{end}}' \
  | grep traefik

# Where does a volume live on disk?
docker volume inspect <volume> --format '{{.Mountpoint}}'

# Check permissions inside a container
docker exec <container> ls -la /path/to/data

Bonus: Hetzner Blocks Port 25 Outbound

After fixing both issues, webmail worked perfectly for receiving email. Then I noticed outgoing mail wasn't going anywhere:

postfix/smtp[6805]: to=<someone@gmail.com>, relay=none, 
delay=2282, dsn=4.4.1, status=deferred 
(connect to gmail-smtp-in.l.google.com:25: Connection timed out)

Hetzner blocks outbound port 25 on new accounts to prevent spam. Inbound port 25 works fine — I can receive email. But I can't send directly to other mail servers.

Options:

SMTP relay (fast): Use something like Brevo (300 emails/day free) and route outbound mail through port 587. Five minutes of config.
Ask Hetzner to unblock it (slow): After 30 days as a customer and paying your first invoice, you can request port 25 be unblocked. Takes days to weeks, approval not guaranteed.
Wait for the physical server: No cloud provider restrictions on hardware you own.

For now I'm living without outbound email. Receiving works, which covers 90% of what I actually need webmail for.

What This Taught Me

"Healthy" means nothing. Docker health checks verify that a process is running, not that it's working correctly. A container can be healthy and completely broken at the same time. Don't trust the status column.

Docker networks don't survive migrations. When you move compose stacks between hosts, network names change because Docker Compose prefixes them with the project name. Any stack that references an external network by name will silently create a new isolated network instead of connecting to the right one. Always use explicit name: in your network declarations.

Volume permissions are a restore problem, not a container problem. When you restore a Docker volume from a tarball, the ownership reflects whoever created the files — often root. The container expects a specific UID. Check permissions on every volume that a non-root process needs to write to, right after restore, before you declare the migration done.

Two bugs. One error message. Both invisible until you looked at the right layer.