ClickHouse troubleshooting

Server not accepting connections

Check systemctl status clickhouse-server and tail /var/log/clickhouse-server/clickhouse-server.log. Verify listeners: ss -tlnp | grep -E '8123|9000'. Confirm listen_host in config allows your client (default may be localhost only). Test HTTP: curl http://127.0.0.1:8123/ping.

Query killed — memory limit exceeded

Query exceeded max_memory_usage. Find the offender: SELECT * FROM system.processes ORDER BY memory_usage DESC. Kill if needed: KILL QUERY WHERE query_id = '...'. Tune settings per user/profile or rewrite the query (smaller scans, pre-aggregation).

Too many parts / insert blocked

Bursty small inserts create excessive parts; merges cannot keep up. Check: SELECT count() FROM system.parts WHERE active AND database='db' AND table='t'. Fix insert batching. Consider OPTIMIZE TABLE ... FINAL during a maintenance window. See also the disk volumes lab if disk is full.

Replication lag or readonly replica

Query system.replicas for is_readonly, queue_size, and absolute_delay. Check ClickHouse Keeper / ZooKeeper connectivity. Network partitions or full disks on a replica cause readonly mode until the queue catches up.

Disk full / no space left

Inspect SELECT * FROM system.disks and du -sh /var/lib/clickhouse/*. Merges and inserts fail when disk is full. Drop old partitions, TTL data, or expand storage. Check for runaway log tables (system.query_log retention).

Slow queries

Use system.query_log to find top offenders by query_duration_ms. Check if the query uses the primary key — full table scans on huge tables are expensive. Look for stuck merges in system.merges competing for I/O.

Authentication failures

Users and passwords live in users.xml or drop-ins under users.d/:

clickhouse-client -u default --password
grep -r password /etc/clickhouse-server/users.d/

Debugging workflow

1. Server health

systemctl status clickhouse-server
curl http://127.0.0.1:8123/ping

2. Active queries and resources

clickhouse-client -q "SELECT query_id, elapsed, memory_usage, query FROM system.processes"

3. Parts, merges, and disk

clickhouse-client -q "SELECT database, table, count() FROM system.parts WHERE active GROUP BY 1,2 ORDER BY count() DESC LIMIT 20"
clickhouse-client -q "SELECT * FROM system.merges"
clickhouse-client -q "SELECT * FROM system.disks"

Server fails to start after config change

ClickHouse validates XML on startup — errors appear in the log:

clickhouse-server --config-file=/etc/clickhouse-server/config.xml --test-config
tail -50 /var/log/clickhouse-server/clickhouse-server.log

Practice scenarios

Hands-on ClickHouse scenarios on live Linux VMs: clickhouse

Cheatsheet →