Postgres troubleshooting
Connection refused
Server not running or not listening. Check
systemctl status postgresql and
pg_isready -h localhost. Verify listen_addresses
in postgresql.conf and port with
ss -tlnp | grep 5432. Read the PostgreSQL log for startup errors.
Password authentication failed
Wrong password, or pg_hba.conf uses a method the client cannot
satisfy. Check the matching hba line for client IP, database, and user.
For local peer auth, OS user must map to DB role. Reload after hba edits:
systemctl reload postgresql or pg_ctlcluster ... reload.
Too many connections
Hit max_connections. Count:
SELECT count(*) FROM pg_stat_activity;. Terminate idle
sessions: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND ....
Long-term fix: PgBouncer or raise limit (uses more memory per connection).
Queries hanging / lock waits
Find blockers in pg_stat_activity where wait_event_type = 'Lock'.
Use the blocking query from the cheatsheet. Cancel with
pg_cancel_backend(pid) or terminate with
pg_terminate_backend(pid). Long transactions holding locks are a common cause.
Disk full / no space left
Check database sizes and WAL directory. Postgres stops accepting writes
when the disk is full. Find large tables with
pg_total_relation_size. Vacuum won't help if the disk is already
full — free space first. See the
disk volumes lab.
Slow queries
Run EXPLAIN ANALYZE on the slow query. Check for sequential
scans on large tables, missing indexes, or stale stats — run
ANALYZE. Enable pg_stat_statements extension for
historical query stats if available.
Streaming replication lag growing
On the primary, check pg_stat_replication for
replay_lag and byte lag via pg_wal_lsn_diff. Causes:
network issues, standby disk too slow, long-running queries on standby blocking
replay (max_standby_streaming_delay), or primary generating WAL
faster than the standby can apply. On the standby, compare
pg_last_wal_receive_lsn() vs pg_last_wal_replay_lsn().
Standby not connecting / no row in pg_stat_replication
Verify replication hba entry on primary, replication user password, and
primary_conninfo on standby. Test from standby host:
psql "host=primary port=5432 user=replicator replication=database".
Ensure max_wal_senders is not exhausted and firewall allows the connection.
Replication slot WAL buildup on primary
Inactive standby with a replication slot prevents WAL recycling — disk fills on
the primary. List slots:
SELECT slot_name, active, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) FROM pg_replication_slots;.
Drop stale slots only when sure the standby is gone:
SELECT pg_drop_replication_slot('slot_name');.
Logical subscription not replicating
Check pg_stat_subscription for connection errors. Common issues:
wal_level not logical on publisher, table not in
publication, subscriber table missing or different schema, or replica identity
not set (ALTER TABLE t REPLICA IDENTITY FULL; for UPDATE/DELETE
without a primary key). View per-table state in pg_subscription_rel.
Cannot write to standby
Physical standbys are read-only while in recovery
(pg_is_in_recovery() = true). Promote to writable primary with
pg_promote() or by removing standby.signal and
restarting (older method). Logical subscribers can accept writes on
non-replicated tables — but conflicts with incoming changes require care.
Replication debugging workflow
1. Identify role (primary vs standby)
psql -c "SELECT pg_is_in_recovery();"2. Streaming status (on primary)
psql -c "SELECT application_name, state, sync_state,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;"3. Logical subscription status (on subscriber)
psql -c "SELECT subname, pid, received_lsn, latest_end_lsn, last_msg_receipt FROM pg_stat_subscription;"
psql -c "SELECT * FROM pg_subscription_rel WHERE srsubstate != 'r';"pg_hba.conf changes not taking effect
Reload the server after editing hba (no full restart needed):
sudo systemctl reload postgresql
# Debian: pg_ctlcluster 16 main reloadServer won't start after config change
Check the log for invalid parameters. On Debian:
pg_ctlcluster 16 main start
tail -50 /var/log/postgresql/postgresql-16-main.logDebugging workflow
1. Can the server accept connections?
pg_isready -h 127.0.0.1 -p 5432
systemctl status postgresql2. What's running right now?
psql -c "SELECT pid, usename, state, wait_event_type, left(query,80) FROM pg_stat_activity"3. Disk and database size
psql -c "SELECT datname, pg_size_pretty(pg_database_size(datname)) FROM pg_database"
df -h /var/lib/postgresqlPractice scenarios
Hands-on Postgres scenarios on live Linux VMs: postgres