SadServers
  • Scenarios
  • Labs
    All Labs Linux & Bash Web Servers Databases Data Processing Docker Kubernetes CI/CD Infrastructure as Code Tooling / Applications
  • Dashboard
  • Solutions
    For Individuals For Businesses
  • Ranking
  • Newsletter
  • Documentation
    FAQ Support Pro Accounts Pro+ Accounts Business Accounts Gift API CLI/TUI Privacy Troubleshooting Interviews
  • Blog
  • Pricing
  • Gift
    Gift Purchase Gift Redeem
  • About
Log In - Sign Up
  1. Labs
  2. Postgres
  3. Troubleshooting

Guide

Concepts and learning path

Troubleshooting

Failure modes and fixes

Cheatsheet

Commands to keep handy

Postgres troubleshooting

Connection refused

Server not running or not listening. Check systemctl status postgresql and pg_isready -h localhost. Verify listen_addresses in postgresql.conf and port with ss -tlnp | grep 5432. Read the PostgreSQL log for startup errors.

Password authentication failed

Wrong password, or pg_hba.conf uses a method the client cannot satisfy. Check the matching hba line for client IP, database, and user. For local peer auth, OS user must map to DB role. Reload after hba edits: systemctl reload postgresql or pg_ctlcluster ... reload.

Too many connections

Hit max_connections. Count: SELECT count(*) FROM pg_stat_activity;. Terminate idle sessions: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND .... Long-term fix: PgBouncer or raise limit (uses more memory per connection).

Queries hanging / lock waits

Find blockers in pg_stat_activity where wait_event_type = 'Lock'. Use the blocking query from the cheatsheet. Cancel with pg_cancel_backend(pid) or terminate with pg_terminate_backend(pid). Long transactions holding locks are a common cause.

Disk full / no space left

Check database sizes and WAL directory. Postgres stops accepting writes when the disk is full. Find large tables with pg_total_relation_size. Vacuum won't help if the disk is already full — free space first. See the disk volumes lab.

Slow queries

Run EXPLAIN ANALYZE on the slow query. Check for sequential scans on large tables, missing indexes, or stale stats — run ANALYZE. Enable pg_stat_statements extension for historical query stats if available.

Streaming replication lag growing

On the primary, check pg_stat_replication for replay_lag and byte lag via pg_wal_lsn_diff. Causes: network issues, standby disk too slow, long-running queries on standby blocking replay (max_standby_streaming_delay), or primary generating WAL faster than the standby can apply. On the standby, compare pg_last_wal_receive_lsn() vs pg_last_wal_replay_lsn().

Standby not connecting / no row in pg_stat_replication

Verify replication hba entry on primary, replication user password, and primary_conninfo on standby. Test from standby host: psql "host=primary port=5432 user=replicator replication=database". Ensure max_wal_senders is not exhausted and firewall allows the connection.

Replication slot WAL buildup on primary

Inactive standby with a replication slot prevents WAL recycling — disk fills on the primary. List slots: SELECT slot_name, active, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) FROM pg_replication_slots;. Drop stale slots only when sure the standby is gone: SELECT pg_drop_replication_slot('slot_name');.

Logical subscription not replicating

Check pg_stat_subscription for connection errors. Common issues: wal_level not logical on publisher, table not in publication, subscriber table missing or different schema, or replica identity not set (ALTER TABLE t REPLICA IDENTITY FULL; for UPDATE/DELETE without a primary key). View per-table state in pg_subscription_rel.

Cannot write to standby

Physical standbys are read-only while in recovery (pg_is_in_recovery() = true). Promote to writable primary with pg_promote() or by removing standby.signal and restarting (older method). Logical subscribers can accept writes on non-replicated tables — but conflicts with incoming changes require care.

Replication debugging workflow

1. Identify role (primary vs standby)

psql -c "SELECT pg_is_in_recovery();"

2. Streaming status (on primary)

psql -c "SELECT application_name, state, sync_state, pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lag_bytes FROM pg_stat_replication;"

3. Logical subscription status (on subscriber)

psql -c "SELECT subname, pid, received_lsn, latest_end_lsn, last_msg_receipt FROM pg_stat_subscription;" psql -c "SELECT * FROM pg_subscription_rel WHERE srsubstate != 'r';"

pg_hba.conf changes not taking effect

Reload the server after editing hba (no full restart needed):

sudo systemctl reload postgresql # Debian: pg_ctlcluster 16 main reload

Server won't start after config change

Check the log for invalid parameters. On Debian:

pg_ctlcluster 16 main start tail -50 /var/log/postgresql/postgresql-16-main.log

Debugging workflow

1. Can the server accept connections?

pg_isready -h 127.0.0.1 -p 5432 systemctl status postgresql

2. What's running right now?

psql -c "SELECT pid, usename, state, wait_event_type, left(query,80) FROM pg_stat_activity"

3. Disk and database size

psql -c "SELECT datname, pg_size_pretty(pg_database_size(datname)) FROM pg_database" df -h /var/lib/postgresql

Practice scenarios

Hands-on Postgres scenarios on live Linux VMs: postgres

Cheatsheet →
SadServersSadServers

Real-world Linux and DevOps scenarios for hands-on learning and technical assessment.

Uptime Robot ratio (30 days)
Product
  • Scenarios
  • For Individuals
  • For Businesses
  • Pricing
Resources
  • FAQ
  • Blog
  • Newsletter
Company
  • About Us
  • Support
  • Privacy Policy
  • Terms of Service
  • Contact
Connect With Us
info@sadservers.com

Made in Canada 🇨🇦
Updated: 2026-06-13 16:06 UTC – 2d2950a