Docker troubleshooting
Cannot connect to the Docker daemon
permission denied while trying to connect to the Docker daemon socket
— user not in docker group or daemon not running. Check
systemctl status docker. Add user to group:
sudo usermod -aG docker $USER (re-login required). As root,
verify socket exists: ls -l /var/run/docker.sock.
Container exits immediately
PID 1 process finished or crashed. Run:
docker logs container_name and
docker inspect container_name | jq -r '.State.ExitCode'.
Common causes: wrong CMD, missing config, app error on startup, or foreground
process not defined (container has nothing to keep it alive). For debugging,
override entrypoint: docker run -it --entrypoint sh image. See
exit codes below to interpret the code.
Exit codes for exited containers
Read the code with
docker inspect CONTAINER | jq '.[0].State.ExitCode'.
Signal-terminated processes often follow
exit code = 128 + signal number (e.g. 137 = SIGKILL, 143 = SIGTERM).
0 — clean exit
Container did its job and exited normally. Expected for short-lived and job containers.
1 — app crashed
docker inspect CONTAINER | jq '.[0].State.ExitCode'
docker logs CONTAINERCould be anything — bad config, failed DB connection, unhandled exception. Check logs for stack traces or error messages.
126 / 127 — entrypoint problems
# 127 — binary not found
ENTRYPOINT ["myapp"] # myapp not in PATH or not installed
# 126 — not executable
ENTRYPOINT ["./start.sh"] # missing chmod +x
Fix: check CMD/ENTRYPOINT spelling; verify the binary
exists in the image with docker run --rm IMAGE which myapp.
137 — OOM kill or force kill
docker inspect CONTAINER | jq '.[0].State.OOMKilled'
dmesg | grep -i "oom\|killed"
docker inspect CONTAINER | jq '.[0].HostConfig.Memory'
If OOMKilled: true — raise the memory limit or fix the leak. If
OOMKilled: false but exit code is still 137 — someone ran
docker kill (SIGKILL).
143 — graceful stop
docker stop CONTAINER # SIGTERM, wait, then SIGKILL (137)
Exit 143 means the app caught SIGTERM and exited cleanly. Exit 137 after
docker stop means the app ignored SIGTERM and was force-killed —
add a signal handler.
SIGTERM handling (common bug)
Shell form wraps the process — SIGTERM may never reach your app:
# Bad — sh is PID 1; SIGTERM does not reach myapp
CMD ["sh", "-c", "myapp"]
# Good — myapp is PID 1 and receives signals directly
CMD ["myapp"]Port already in use
bind: address already in use when publishing ports. Find conflict:
ss -tlnp | grep :8080. Stop the other service or map a different
host port: -p 8081:80. Remember format is
host_port:container_port.
Cannot access service on published port
Container running but connection refused from host. Verify mapping:
docker port container_name. App may listen on 127.0.0.1
inside the container — it must bind 0.0.0.0 to accept external
traffic. Check firewall on the host (iptables, cloud security groups).
Confirm health: docker exec container curl -s localhost:80.
Volume permission denied
Bind-mounted host directory owned by root or another UID; container runs as
non-root. Check with ls -la /host/path and
docker inspect container | jq -r '.Config.User'. Align ownership,
set user: in Compose, or fix permissions in an entrypoint script.
SELinux may block mounts — try :Z suffix on RHEL systems.
Image pull fails
Network, auth, or rate limit. Test: docker pull nginx:alpine.
For private registries: docker login. Check DNS and proxy env vars.
toomanyrequests from Docker Hub — authenticate or use a mirror.
Corporate TLS inspection may need custom CA in daemon config.
No space left on device
Images, layers, logs, and volumes fill disk under
/var/lib/docker. Audit:
docker system df and df -h. Prune:
docker container prune, docker image prune -a,
docker volume prune (only if volumes are truly unused). Configure
log rotation via --log-opt max-size or daemon log-opts.
See the disk volumes lab.
Container OOM killed
Process exceeded memory limit — exit code 137 when
OOMKilled: true. Check
docker inspect CONTAINER | jq '.[0].State.OOMKilled' and host
dmesg | grep -i oom. Increase --memory limit or fix
memory leak. Without limits, a runaway container can take down the whole host.
See exit codes section for 137 vs graceful 143.
Compose service unhealthy or not starting
docker compose ps shows state; docker compose logs service
for errors. depends_on does not wait for app readiness — only
container start. Use healthcheck with
depends_on: condition: service_healthy (Compose v2). Wrong build
context or env file path are frequent config mistakes.
Container cannot reach host or internet
DNS failure inside container: docker exec container ping -c1 8.8.8.8
vs ping google.com. Check daemon DNS settings in
/etc/docker/daemon.json. Custom iptables rules or VPN
can break bridge NAT. On Linux, reach host services via
host.docker.internal (may need extra_hosts) or host gateway IP.
Debugging common Dockerfile issues
Build-time failures differ from runtime container issues. For a built image that exits on start, see exit codes and the debugging workflow below.
Core build debug techniques
# Shell into the last successful layer from failed build output
# e.g. ---> a1b2c3d4e5f6
docker run --rm -it a1b2c3d4e5f6 /bin/sh
# Manually run the failing RUN command to see the real error
docker build --no-cache -t myimage . # rule out stale cache
DOCKER_BUILDKIT=1 docker build --progress=plain -t myimage . 2>&1 | tee build.logLayer and cache issues
Stale code or dependencies despite source changes — layer order busts cache wrong:
# Bad — COPY after RUN: code changes don't invalidate dep cache
RUN npm install
COPY . .
# Good — dependency files first, then full tree
COPY package*.json ./
RUN npm install
COPY . .Put instructions that change least often at the top. Inspect layers:
docker history --no-trunc myimage
docker image inspect myimage | jq '.[0].RootFS.Layers'
# Interactive layer explorer (dive)
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive myimagePackage install failures (apt/yum)
# Bad — stale apt cache in layer; lists left in image
RUN apt-get update
RUN apt-get install -y curl
# Good — single layer, fresh index, cleanup
RUN apt-get update && apt-get install -y --no-install-recommends \
curl wget \
&& rm -rf /var/lib/apt/lists/*
# Pin to avoid "package not found" across rebuilds
RUN apt-get install -y curl=7.88.1-10Files, permissions, and .dockerignore
docker run --rm myimage ls -la /app
docker run --rm myimage stat /app/start.sh
COPY start.sh /app/
RUN chmod +x /app/start.sh
COPY --chown=appuser:appuser . /app
.dockerignore can silently exclude files the build needs. Check
cat .dockerignore and watch build context size in plain build output.
ARG, ENV, and build secrets
ARG BUILD_ENV=production # build-time only
ENV APP_ENV=production # also available at runtime
ARG BASE_VERSION=3.11
FROM python:${BASE_VERSION}
docker run --rm myimage env
docker inspect myimage | jq '.[0].Config.Env'
# Bad — secret baked into a layer forever
ENV API_KEY=supersecret
# Good — BuildKit secret mount
RUN --mount=type=secret,id=api_key \
API_KEY=$(cat /run/secrets/api_key) ./setup.shMulti-stage builds
FROM node:20 AS builder
WORKDIR /app
RUN npm run build # outputs to /app/dist
FROM nginx:alpine
# Wrong — dist is under /app, not /
COPY --from=builder /dist /usr/share/nginx/html
# Correct
COPY --from=builder /app/dist /usr/share/nginx/html
docker build --target builder -t debug-builder .
docker run --rm -it debug-builder /bin/shENTRYPOINT and CMD in the image
docker inspect myimage | jq '.[0].Config | {Entrypoint, Cmd}'
ENTRYPOINT ["myapp"]
CMD ["--config", "/etc/default.conf"] # default args, overridable at runPrefer exec form over shell form for signal handling — see exit codes section.
Network and DNS during build
docker run --rm busybox nslookup google.com
docker build --network=host .
docker build --dns 8.8.8.8 .
docker build \
--build-arg HTTP_PROXY=http://proxy:3128 \
--build-arg HTTPS_PROXY=http://proxy:3128 .Debugging workflow
1. Container state and inspect
docker ps -a
docker inspect CONTAINER
docker inspect CONTAINER | jq '{status: .State.Status, exit: .State.ExitCode, oom: .State.OOMKilled}'2. docker exec — shell inside the container
General first step for running containers: inspect files, env, DNS, and local listeners from inside the container namespace.
docker exec -ti CONTAINER /bin/bash
docker exec -ti CONTAINER /bin/sh # if bash is not in the image3. Override entrypoint — isolate startup script issues
If the container exits on start, run the image with a shell instead of the default entrypoint/CMD. Confirms whether the failure is in the bundled startup script or the image itself.
docker run -it --entrypoint sh IMAGE
docker run -it --entrypoint /bin/bash IMAGE4. Events, logs, and processes
docker events --filter container=CONTAINER --since 1h
docker logs --tail 50 CONTAINER5. CPU and memory
docker stats # live CPU/RAM for all containers
docker stats CONTAINER # one container
docker top CONTAINER # PIDs and command lines inside6. Network and mounts
docker network ls
docker port CONTAINER
docker inspect CONTAINER | jq '.NetworkSettings.Networks'
docker inspect CONTAINER | jq '.Mounts'
docker exec CONTAINER sh -c 'wget -qO- localhost:PORT || curl -s localhost:PORT'Practice scenarios
Hands-on Docker scenarios on live Linux VMs: docker