etcd guide

What etcd does in production

etcd stores small amounts of critical data with strong consistency guarantees. Unlike a general-purpose database, it is optimized for reliable reads/writes of configuration and metadata — not large documents or analytics workloads.

Kubernetes — the API server's only persistent store; cluster state lives here
Service discovery — register and watch endpoints for dynamic services
Distributed locking — coordinate leaders and exclusive work across nodes
Configuration — shared settings that many processes must agree on

etcd and Kubernetes

Every Kubernetes cluster depends on etcd. When you kubectl apply a Deployment, the API server validates the request and writes the object to etcd. Controllers and kubelets watch etcd (via the API server) for changes. If etcd is unavailable or corrupt, the control plane cannot function — existing workloads on nodes may keep running, but scheduling, scaling, and updates stop.

Managed Kubernetes (EKS, GKE, AKS) hides etcd, but self-managed and kubeadm clusters expose it directly. Troubleshooting a broken cluster often means checking etcd health, quorum, and disk. See also the Kubernetes scenarios on SadServers.

How a client connection works

When a client (etcdctl, kube-apiserver, or an app using the etcd client library) connects:

TCP connect — client reaches host:2379 (client port)
TLS / auth — production clusters use mutual TLS and optionally RBAC users
gRPC request — get, put, delete, watch, or transaction against a key prefix
Raft commit — writes go to the leader and replicate to a majority before ack
Response — result returns; watches stream subsequent changes

Reads can be linearizable (default, goes through leader) or serializable (may read from any member with slightly relaxed guarantees). Kubernetes requires a healthy, consistent etcd — treat outages as control-plane emergencies.

Key files and configuration

/etc/etcd/etcd.conf.yml — common config path (varies by install)
--data-dir — WAL and snapshot storage (e.g. /var/lib/etcd/)
--listen-client-urls — client API bind (port 2379)
--listen-peer-urls — member-to-member Raft traffic (port 2380)
--initial-cluster — bootstrap member list for new clusters
Certificates — --cert-file, --key-file, --trusted-ca-file for TLS

Core concepts

Keys and values — byte strings; Kubernetes uses hierarchical paths like /registry/pods/...
Revision — cluster-wide monotonic counter; every change increments it
Leases — TTL-bound keys; Kubernetes uses leases for node heartbeats
Watches — long-poll streams of changes from a revision — how controllers react
Transactions — compare-and-swap style atomic multi-op updates

Cluster and replication (Raft)

etcd clusters have an odd number of members (typically 3 or 5) for quorum. One member is the leader at a time; followers replicate the Raft log. Writes succeed when committed to a majority — a 3-node cluster tolerates 1 failure; 5-node tolerates 2.

Member ports: clients use 2379; peers communicate on 2380. Each member needs a unique name and reachable peer URL.

Adding a member: use etcdctl member add (or API), update all members' configuration, then start the new node with the updated cluster state. Removing a member requires quorum — never force-remove a majority of nodes.

Monitor with etcdctl endpoint health, etcdctl endpoint status, and etcdctl member list. Kubernetes stacks often wrap these in etcdctl inside static pods on control-plane nodes.

Maintenance

etcd retains a revision history — without compaction, disk grows unbounded. Run compaction to drop old revisions (Kubernetes apiserver compacts periodically; standalone clusters need etcdctl compact). After compaction, defragmentation (etcdctl defrag) reclaims disk space from the BoltDB backend. Take a snapshot before major maintenance.

Snapshots — point-in-time backup of cluster state. Schedule regular snapshots; they are the primary disaster-recovery path for etcd and thus for Kubernetes control-plane state.

Backups

Replication is not a backup — a bad write committed to quorum is replicated everywhere. Use etcdctl snapshot save for consistent backups. Restore with etcdctl snapshot restore into a fresh data directory (restore creates a new cluster state — plan member IDs and initial cluster config carefully).

For Kubernetes: back up etcd before control-plane upgrades. kubeadm documents snapshot procedures for stacked etcd. Test restore on a staging cluster — an untested backup is not a backup.

Learning resources

etcd documentation — etcd.io/docs
Operations guide — etcd.io — operations
Kubernetes etcd — kubernetes.io — configure and upgrade etcd
Raft consensus — raft.github.io

Practice scenarios

Hands-on etcd scenarios on live Linux VMs: etcd

Troubleshooting →