ClickHouse guide
What ClickHouse does in production
ClickHouse is optimized for analytical workloads: aggregations, scans over billions of rows, and time-series queries. Unlike row-oriented OLTP databases (Postgres, MySQL), it stores data by column — compression is excellent and queries read only the columns they need. It is not a general-purpose transactional database; use it for append-heavy analytics and reporting.
MergeTree and data on disk
Most production tables use the MergeTree engine family. Inserts create new parts on disk; background merges combine smaller parts into larger ones. Data is partitioned (often by date) and sorted by a primary key within each part. Too many small parts — from bursty inserts or failed merges — degrades performance and can block writes.
How a query is processed
A simplified workflow when you run a SELECT:
- Client sends query — via
clickhouse-client(native port 9000) or HTTP (8123) - Parser and planner — query is parsed; relevant parts are selected using the primary key and partition key
- Execution — data is read column-by-column from disk; aggregation/filtering runs in memory up to limits
- Result — rows stream back to the client; query metrics appear in
system.query_log
Heavy queries hit max_memory_usage or max_execution_time
and are killed — check settings at user, profile, or query level.
Server layout and ports
- HTTP —
8123(curl, JDBC HTTP, monitoring) - Native TCP —
9000(clickhouse-client, inter-server replication) - Config —
/etc/clickhouse-server/config.xml, users inusers.xmlorusers.d/ - Data —
/var/lib/clickhouse/(parts, metadata) - Logs —
/var/log/clickhouse-server/clickhouse-server.log
System tables to know
system.parts— parts per table, size, rows, active statussystem.merges— in-progress background mergessystem.processes— currently running queriessystem.query_log— finished query history and resource usagesystem.replicas— replication queue, lag, readonly status (ReplicatedMergeTree)system.disks— disk space per storage volume
Replication
ReplicatedMergeTree tables use ClickHouse Keeper (or ZooKeeper) to
coordinate replicas. A replica can go readonly if it loses
coordination or falls behind. Monitor system.replicas for
queue_size, absolute_delay, and is_readonly.
Learning resources
- ClickHouse documentation — clickhouse.com/docs (official docs)
- MergeTree engine — clickhouse.com/docs — MergeTree
- System tables — clickhouse.com/docs — system tables
- Replication — clickhouse.com/docs — replication
Practice scenarios
Hands-on ClickHouse scenarios on live Linux VMs: clickhouse