Docker HEALTHCHECK: making your containers self-aware and dependency-ready
I was debugging a race condition where my API started before Postgres finished initializing, causing connection errors on startup. The fix was Docker HEALTHCHECK combined with depends_on conditions — the API only starts after Postgres reports itself healthy. I also discovered that HEALTHCHECK makes container issues immediately visible in docker ps. Here is the complete pattern.
HEALTHCHECK syntax
HEALTHCHECK [OPTIONS] CMD command
# Options:
# --interval=30s How often to run the check (default: 30s)
# --timeout=30s Max time for the check to complete (default: 30s)
# --start-period=5s Grace period before checks matter (default: 0s)
# --start-interval=5s Check frequency during start-period (default: 5s)
# --retries=3 Consecutive failures before UNHEALTHY (default: 3)
Web service health check
FROM node:20-alpine AS production
# Install wget for health checks (curl adds ~4MB, wget is built into alpine)
RUN apk add --no-cache wget
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 CMD wget -qO- http://localhost:3000/health || exit 1
CMD ["node", "dist/index.js"]
// The health endpoint your service must implement
app.get('/health', async (req, res) => {
try {
// Check database connectivity
await db.raw('SELECT 1');
// Check Redis connectivity
await redis.ping();
res.status(200).json({
status: 'healthy',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
checks: {
database: 'ok',
cache: 'ok',
}
});
} catch (error) {
// Returning 503 marks container as unhealthy
res.status(503).json({
status: 'unhealthy',
error: (error as Error).message,
});
}
});
Database health checks
services:
postgres:
image: postgres:16-alpine
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
interval: 5s
timeout: 5s
retries: 5
start_period: 10s
mysql:
image: mysql:8
healthcheck:
test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
interval: 5s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
mongodb:
image: mongo:7
healthcheck:
test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]
interval: 10s
timeout: 5s
retries: 5
start_period: 20s
depends_on with health conditions
services:
api:
build: .
depends_on:
postgres:
condition: service_healthy # Wait for HEALTHCHECK to pass
redis:
condition: service_healthy # Wait for HEALTHCHECK to pass
migrations:
condition: service_completed_successfully # Wait for one-time job
migrations:
build: .
command: ["npm", "run", "db:migrate"]
depends_on:
postgres:
condition: service_healthy
restart: "no" # Run once, don't restart
Worker health check (queue processor)
# Workers often have no HTTP port — use a file-based health check
FROM python:3.12-slim
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY worker.py .
# Write a heartbeat file every N seconds; health check verifies it's recent
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 CMD python3 -c "
import os, time
mtime = os.path.getmtime('/tmp/worker_heartbeat')
age = time.time() - mtime
if age > 120: exit(1) # Fail if heartbeat is older than 2 minutes
print(f'Heartbeat age: {age:.0f}s')
exit(0)
"
CMD ["python3", "worker.py"]
# In worker.py — write heartbeat file periodically
import time
import threading
from pathlib import Path
def write_heartbeat():
while True:
Path('/tmp/worker_heartbeat').touch()
time.sleep(30)
# Start heartbeat thread
thread = threading.Thread(target=write_heartbeat, daemon=True)
thread.start()
# Main worker loop
while True:
job = queue.dequeue()
if job:
process_job(job)
Health checks change how you think about container dependencies. Instead of adding sleep 10 hacks or complex retry logic in your startup code, you declare the health requirements declaratively in Compose. The service_completed_successfully condition for migration containers is particularly useful — it ensures schema migrations run before the API starts, every time, with no application-level retry logic needed.