ClickHouse guide

What ClickHouse does in production

ClickHouse is optimized for analytical workloads: aggregations, scans over billions of rows, and time-series queries. Unlike row-oriented OLTP databases (Postgres, MySQL), it stores data by column — compression is excellent and queries read only the columns they need. It is not a general-purpose transactional database; use it for append-heavy analytics and reporting.

MergeTree and data on disk

Most production tables use the MergeTree engine family. Inserts create new parts on disk; background merges combine smaller parts into larger ones. Data is partitioned (often by date) and sorted by a primary key within each part. Too many small parts — from bursty inserts or failed merges — degrades performance and can block writes.

How a query is processed

A simplified workflow when you run a SELECT:

Client sends query — via clickhouse-client (native port 9000) or HTTP (8123)
Parser and planner — query is parsed; relevant parts are selected using the primary key and partition key
Execution — data is read column-by-column from disk; aggregation/filtering runs in memory up to limits
Result — rows stream back to the client; query metrics appear in system.query_log

Heavy queries hit max_memory_usage or max_execution_time and are killed — check settings at user, profile, or query level.

Server layout and ports

HTTP — 8123 (curl, JDBC HTTP, monitoring)
Native TCP — 9000 (clickhouse-client, inter-server replication)
Config — /etc/clickhouse-server/config.xml, users in users.xml or users.d/
Data — /var/lib/clickhouse/ (parts, metadata)
Logs — /var/log/clickhouse-server/clickhouse-server.log

System tables to know

system.parts — parts per table, size, rows, active status
system.merges — in-progress background merges
system.processes — currently running queries
system.query_log — finished query history and resource usage
system.replicas — replication queue, lag, readonly status (ReplicatedMergeTree)
system.disks — disk space per storage volume

Replication

ReplicatedMergeTree tables use ClickHouse Keeper (or ZooKeeper) to coordinate replicas. A replica can go readonly if it loses coordination or falls behind. Monitor system.replicas for queue_size, absolute_delay, and is_readonly.

Learning resources

ClickHouse documentation — clickhouse.com/docs (official docs)
MergeTree engine — clickhouse.com/docs — MergeTree
System tables — clickhouse.com/docs — system tables
Replication — clickhouse.com/docs — replication

Practice scenarios

Hands-on ClickHouse scenarios on live Linux VMs: clickhouse

Troubleshooting →