etcd guide
What etcd does in production
etcd stores small amounts of critical data with strong consistency guarantees. Unlike a general-purpose database, it is optimized for reliable reads/writes of configuration and metadata — not large documents or analytics workloads.
- Kubernetes — the API server's only persistent store; cluster state lives here
- Service discovery — register and watch endpoints for dynamic services
- Distributed locking — coordinate leaders and exclusive work across nodes
- Configuration — shared settings that many processes must agree on
etcd and Kubernetes
Every Kubernetes cluster depends on etcd. When you
kubectl apply a Deployment, the API server validates the request and
writes the object to etcd. Controllers and kubelets watch etcd (via the API server)
for changes. If etcd is unavailable or corrupt, the control plane cannot function —
existing workloads on nodes may keep running, but scheduling, scaling, and updates stop.
Managed Kubernetes (EKS, GKE, AKS) hides etcd, but self-managed and kubeadm clusters expose it directly. Troubleshooting a broken cluster often means checking etcd health, quorum, and disk. See also the Kubernetes scenarios on SadServers.
How a client connection works
When a client (etcdctl, kube-apiserver, or an app using the etcd client library) connects:
- TCP connect — client reaches
host:2379(client port) - TLS / auth — production clusters use mutual TLS and optionally RBAC users
- gRPC request — get, put, delete, watch, or transaction against a key prefix
- Raft commit — writes go to the leader and replicate to a majority before ack
- Response — result returns; watches stream subsequent changes
Reads can be linearizable (default, goes through leader) or serializable (may read from any member with slightly relaxed guarantees). Kubernetes requires a healthy, consistent etcd — treat outages as control-plane emergencies.
Key files and configuration
/etc/etcd/etcd.conf.yml— common config path (varies by install)--data-dir— WAL and snapshot storage (e.g./var/lib/etcd/)--listen-client-urls— client API bind (port 2379)--listen-peer-urls— member-to-member Raft traffic (port 2380)--initial-cluster— bootstrap member list for new clusters- Certificates —
--cert-file,--key-file,--trusted-ca-filefor TLS
Core concepts
- Keys and values — byte strings; Kubernetes uses hierarchical paths like
/registry/pods/... - Revision — cluster-wide monotonic counter; every change increments it
- Leases — TTL-bound keys; Kubernetes uses leases for node heartbeats
- Watches — long-poll streams of changes from a revision — how controllers react
- Transactions — compare-and-swap style atomic multi-op updates
Cluster and replication (Raft)
etcd clusters have an odd number of members (typically 3 or 5) for quorum. One member is the leader at a time; followers replicate the Raft log. Writes succeed when committed to a majority — a 3-node cluster tolerates 1 failure; 5-node tolerates 2.
Member ports: clients use 2379; peers communicate on
2380. Each member needs a unique name and reachable peer URL.
Adding a member: use etcdctl member add (or API), update
all members' configuration, then start the new node with the updated cluster state.
Removing a member requires quorum — never force-remove a majority of nodes.
Monitor with etcdctl endpoint health, etcdctl endpoint status,
and etcdctl member list. Kubernetes stacks often wrap these in
etcdctl inside static pods on control-plane nodes.
Maintenance
etcd retains a revision history — without compaction, disk grows unbounded. Run
compaction to drop old revisions (Kubernetes apiserver compacts
periodically; standalone clusters need etcdctl compact). After compaction,
defragmentation (etcdctl defrag) reclaims disk space
from the BoltDB backend. Take a snapshot before major maintenance.
Snapshots — point-in-time backup of cluster state. Schedule regular snapshots; they are the primary disaster-recovery path for etcd and thus for Kubernetes control-plane state.
Backups
Replication is not a backup — a bad write committed to quorum is replicated everywhere.
Use etcdctl snapshot save for consistent backups. Restore with
etcdctl snapshot restore into a fresh data directory (restore creates
a new cluster state — plan member IDs and initial cluster config carefully).
For Kubernetes: back up etcd before control-plane upgrades. kubeadm documents snapshot procedures for stacked etcd. Test restore on a staging cluster — an untested backup is not a backup.
Learning resources
- etcd documentation — etcd.io/docs
- Operations guide — etcd.io — operations
- Kubernetes etcd — kubernetes.io — configure and upgrade etcd
- Raft consensus — raft.github.io
Practice scenarios
Hands-on etcd scenarios on live Linux VMs: etcd