Postgres guide
What PostgreSQL does in production
PostgreSQL stores relational data in databases, with schemas, tables, indexes,
and constraints. It supports ACID transactions, rich SQL, JSON, extensions
(PostGIS, pg_stat_statements), and streaming replication. Application servers
connect over TCP (default port 5432) using connection pools or
direct psql sessions.
How a client connection works
When an application connects, the typical workflow is:
- TCP connect — client reaches
host:5432 - Authentication — server checks
pg_hba.conffor the client IP, database, and user; applies method (scram-sha-256, md5, peer, trust) - Backend fork — postmaster spawns a dedicated backend process for the session
- Query execution — SQL is parsed, planned, executed; locks acquired as needed
- Result — rows return to the client; connection stays open for more queries or closes
Key files and directories
postgresql.conf— server settings (memory, connections, WAL, logging)pg_hba.conf— who can connect, from where, and how they authenticatepg_ident.conf— maps OS users to database users (peer auth)- Data directory —
/var/lib/postgresql/<version>/main/(Debian) or/var/lib/pgsql/data/(RHEL) - Logs — often
/var/log/postgresql/or configured vialog_directory
Roles, databases, and permissions
PostgreSQL uses roles for authentication and authorization.
A role can own databases and tables. GRANT controls table-level
access. The superuser role (often postgres) bypasses permission
checks — use sparingly in production.
System catalogs and monitoring views
pg_stat_activity— current sessions and running queriespg_locks— lock holders and waiterspg_stat_database— per-database stats (commits, deadlocks, cache hit)pg_stat_user_tables— table access and vacuum statspg_database_size()/pg_total_relation_size()— disk usage helperspg_stat_replication— streaming replication status on the primarypg_stat_subscription— logical replication subscriber status
Maintenance essentials
- VACUUM — reclaims dead tuple space (autovacuum runs this automatically)
- ANALYZE — updates planner statistics
- WAL — write-ahead log for durability and replication; monitor disk use
max_connections— cap on concurrent backends; connection pooling (PgBouncer) is common
Replication
PostgreSQL offers two main replication models. Both rely on the WAL, but they ship changes differently and suit different goals.
Streaming replication (physical)
A standby server receives WAL records over the network and replays
them byte-for-byte, producing an identical copy of the entire primary cluster.
Standbys are read-only (hot_standby). This is the standard approach
for high availability and failover.
Asynchronous (default) — primary commits without waiting for the
standby to confirm; small risk of lost transactions if the primary fails before
WAL is received. Synchronous — primary waits for one or more
standbys (synchronous_standby_names); stronger durability, higher
write latency.
Basic setup (primary):
wal_level = replica(orlogicalif you also need logical)max_wal_senders≥ number of standbys- Replication role with
REPLICATIONprivilege; hba line for replication connections
Basic setup (standby): take a base backup with
pg_basebackup -h primary -D /var/lib/postgresql/... -U replicator -P -R
(the -R flag writes primary_conninfo and
standby.signal). Start PostgreSQL — it enters recovery and streams WAL.
Advantages: simple mental model; full cluster copy; mature failover
tooling (Patroni, repmgr); standbys can serve read queries.
Disadvantages: replicates everything (all databases); cannot pick
individual tables; major version upgrades require pg_upgrade or logical replication;
standby cannot accept writes.
Logical replication
The primary decodes WAL into logical row changes. A publication on the publisher defines which tables to replicate; a subscription on the subscriber pulls changes. Useful for partial replication, migrations, and cross-version upgrades.
Basic setup:
- Publisher:
wal_level = logical CREATE PUBLICATION my_pub FOR TABLE t1, t2;- Subscriber:
CREATE SUBSCRIPTION my_sub CONNECTION 'host=...' PUBLICATION my_pub;
Advantages: replicate selected tables only; subscriber can be a
different Postgres major version; subscriber tables can accept writes (beware
conflicts); good for rolling upgrades and feeding data warehouses.
Disadvantages: DDL is not replicated; sequences need separate
handling; more operational complexity; not a complete HA solution on its own
(use with streaming replication for failover); initial copy can be slow on large tables.
Monitor both types with pg_stat_replication (streaming) and
pg_stat_subscription / pg_subscription_rel (logical).
Backups
Replication is not a backup — a mistaken DROP TABLE on the primary
replicates to standbys. You need independent backup and restore capability.
Logical backup (pg_dump) — exports SQL or custom-format
snapshots of a database. Simple, portable across versions; restore with
pg_restore or psql. Best for smaller databases or
selective restores. Use pg_dumpall for globals (roles, tablespaces).
Physical backup (pg_basebackup) — copies the data
directory at the filesystem level while the server runs (uses replication protocol).
Required to seed standbys; also the base for full-cluster restore. Pair with
WAL archiving for point-in-time recovery (PITR) — archive
wal_level must be replica or higher and
archive_mode = on.
Tools — pgBackRest and Barman automate full/incremental backups, WAL management, and PITR for production.
Do not tar or rsync the data directory
on a running primary without a backup tool — you risk an inconsistent copy.
Test restores regularly; an untested backup is not a backup.
Learning resources
- PostgreSQL documentation — postgresql.org/docs (official reference)
- Client authentication — postgresql.org/docs — pg_hba.conf
- Server configuration — postgresql.org/docs — configuration
- Monitoring stats views — postgresql.org/docs — monitoring
- Streaming replication — postgresql.org/docs — streaming replication
- Logical replication — postgresql.org/docs — logical replication
- Backup and restore — postgresql.org/docs — backup
Practice scenarios
Hands-on Postgres scenarios on live Linux VMs: postgres