Postgres guide

What PostgreSQL does in production

PostgreSQL stores relational data in databases, with schemas, tables, indexes, and constraints. It supports ACID transactions, rich SQL, JSON, extensions (PostGIS, pg_stat_statements), and streaming replication. Application servers connect over TCP (default port 5432) using connection pools or direct psql sessions.

How a client connection works

When an application connects, the typical workflow is:

TCP connect — client reaches host:5432
Authentication — server checks pg_hba.conf for the client IP, database, and user; applies method (scram-sha-256, md5, peer, trust)
Backend fork — postmaster spawns a dedicated backend process for the session
Query execution — SQL is parsed, planned, executed; locks acquired as needed
Result — rows return to the client; connection stays open for more queries or closes

Key files and directories

postgresql.conf — server settings (memory, connections, WAL, logging)
pg_hba.conf — who can connect, from where, and how they authenticate
pg_ident.conf — maps OS users to database users (peer auth)
Data directory — /var/lib/postgresql/<version>/main/ (Debian) or /var/lib/pgsql/data/ (RHEL)
Logs — often /var/log/postgresql/ or configured via log_directory

Roles, databases, and permissions

PostgreSQL uses roles for authentication and authorization. A role can own databases and tables. GRANT controls table-level access. The superuser role (often postgres) bypasses permission checks — use sparingly in production.

System catalogs and monitoring views

pg_stat_activity — current sessions and running queries
pg_locks — lock holders and waiters
pg_stat_database — per-database stats (commits, deadlocks, cache hit)
pg_stat_user_tables — table access and vacuum stats
pg_database_size() / pg_total_relation_size() — disk usage helpers
pg_stat_replication — streaming replication status on the primary
pg_stat_subscription — logical replication subscriber status

Maintenance essentials

VACUUM — reclaims dead tuple space (autovacuum runs this automatically)
ANALYZE — updates planner statistics
WAL — write-ahead log for durability and replication; monitor disk use
max_connections — cap on concurrent backends; connection pooling (PgBouncer) is common

Replication

PostgreSQL offers two main replication models. Both rely on the WAL, but they ship changes differently and suit different goals.

Streaming replication (physical)

A standby server receives WAL records over the network and replays them byte-for-byte, producing an identical copy of the entire primary cluster. Standbys are read-only (hot_standby). This is the standard approach for high availability and failover.

Asynchronous (default) — primary commits without waiting for the standby to confirm; small risk of lost transactions if the primary fails before WAL is received. Synchronous — primary waits for one or more standbys (synchronous_standby_names); stronger durability, higher write latency.

Basic setup (primary):

wal_level = replica (or logical if you also need logical)
max_wal_senders ≥ number of standbys
Replication role with REPLICATION privilege; hba line for replication connections

Basic setup (standby): take a base backup with pg_basebackup -h primary -D /var/lib/postgresql/... -U replicator -P -R (the -R flag writes primary_conninfo and standby.signal). Start PostgreSQL — it enters recovery and streams WAL.

Advantages: simple mental model; full cluster copy; mature failover tooling (Patroni, repmgr); standbys can serve read queries.
Disadvantages: replicates everything (all databases); cannot pick individual tables; major version upgrades require pg_upgrade or logical replication; standby cannot accept writes.

Logical replication

The primary decodes WAL into logical row changes. A publication on the publisher defines which tables to replicate; a subscription on the subscriber pulls changes. Useful for partial replication, migrations, and cross-version upgrades.

Basic setup:

Publisher: wal_level = logical
CREATE PUBLICATION my_pub FOR TABLE t1, t2;
Subscriber: CREATE SUBSCRIPTION my_sub CONNECTION 'host=...' PUBLICATION my_pub;

Advantages: replicate selected tables only; subscriber can be a different Postgres major version; subscriber tables can accept writes (beware conflicts); good for rolling upgrades and feeding data warehouses.
Disadvantages: DDL is not replicated; sequences need separate handling; more operational complexity; not a complete HA solution on its own (use with streaming replication for failover); initial copy can be slow on large tables.

Monitor both types with pg_stat_replication (streaming) and pg_stat_subscription / pg_subscription_rel (logical).

Backups

Replication is not a backup — a mistaken DROP TABLE on the primary replicates to standbys. You need independent backup and restore capability.

Logical backup (pg_dump) — exports SQL or custom-format snapshots of a database. Simple, portable across versions; restore with pg_restore or psql. Best for smaller databases or selective restores. Use pg_dumpall for globals (roles, tablespaces).

Physical backup (pg_basebackup) — copies the data directory at the filesystem level while the server runs (uses replication protocol). Required to seed standbys; also the base for full-cluster restore. Pair with WAL archiving for point-in-time recovery (PITR) — archive wal_level must be replica or higher and archive_mode = on.

Tools — pgBackRest and Barman automate full/incremental backups, WAL management, and PITR for production.

Do not tar or rsync the data directory on a running primary without a backup tool — you risk an inconsistent copy. Test restores regularly; an untested backup is not a backup.

Learning resources

PostgreSQL documentation — postgresql.org/docs (official reference)
Client authentication — postgresql.org/docs — pg_hba.conf
Server configuration — postgresql.org/docs — configuration
Monitoring stats views — postgresql.org/docs — monitoring
Streaming replication — postgresql.org/docs — streaming replication
Logical replication — postgresql.org/docs — logical replication
Backup and restore — postgresql.org/docs — backup

Practice scenarios

Hands-on Postgres scenarios on live Linux VMs: postgres

Troubleshooting →