ELK stack guide
What the stack does in production
Applications write logs to files or stdout. You need to collect those events, normalize them (timestamps, fields, grok patterns), and store them so operators can search during incidents. The classic Elastic path (without Kibana in this lab):
- Filebeat — lightweight agent on each host; tails log files and forwards events
- Logstash — central pipeline: input → filter → output
- Elasticsearch — distributed search engine and document store (indices)
Typical flow:
app logs → Filebeat → Logstash → Elasticsearch.
Filebeat can also send directly to Elasticsearch for simpler setups, but Logstash
handles heavy parsing and enrichment. The same way Logstash can also send directly from the
application but if Logstash goes down or crashes it can degrade the performance of the
application and may drop logs entirely if its internal memory buffer overflows.
Filebeat — the shipper
Filebeat watches files listed in filebeat.yml under
filebeat.inputs. It tracks read position in a registry file so restarts
do not re-ship old lines (unless configured otherwise). Outputs are usually
Logstash (output.logstash) or Elasticsearch
(output.elasticsearch).
- Config —
/etc/filebeat/filebeat.yml - Registry —
/var/lib/filebeat/registry(state) - Modules — prebuilt configs for nginx, apache, system logs, etc.
- Service —
filebeat(systemd)
Enable a module: filebeat modules enable nginx, then
filebeat setup if writing to Elasticsearch (index templates).
Logstash — the pipeline
Logstash runs a pipeline defined in
/etc/logstash/conf.d/*.conf (or a single logstash.conf).
Three stages:
- input — beats (Filebeat), syslog, kafka, file, etc.
- filter — grok, mutate, date, geoip, drop unwanted events
- output — elasticsearch, stdout (debug), kafka, etc.
Default Beats input listens on port 5044. Test pipelines with
stdout { codec => rubydebug } before sending to Elasticsearch.
Reload: systemctl restart logstash (or hot reload if enabled).
Elasticsearch — the store
Elasticsearch stores JSON documents in indices
(e.g. nginx-access-2024.06.24). Shards spread data across nodes;
replicas provide redundancy. Cluster health:
green (all good), yellow (replicas unassigned on
single-node), red (primary shards missing — data loss risk).
- HTTP API — port
9200(REST queries and admin) - Transport — port
9300(node-to-node; not for clients) - Config —
/etc/elasticsearch/elasticsearch.yml - Data —
/var/lib/elasticsearch/ - Logs —
/var/log/elasticsearch/
Query from the shell with curl and the Query DSL, or use
_search for full-text search. Index Lifecycle Management (ILM) rotates
and deletes old indices to control disk use.
Index naming and daily indices
Log pipelines often use daily indices:
logs-%{+YYYY.MM.dd} in Logstash output. That makes retention easy
(delete indices older than N days) and keeps shard sizes manageable. Filebeat
modules and Elastic integrations ship index templates that map fields correctly
for search and aggregations.
Security basics
Production clusters enable TLS and authentication (native users, API keys, or
LDAP). Filebeat and Logstash need credentials in their output blocks when security
is on. Never expose port 9200 to the public internet without auth. For local labs,
xpack.security.enabled: false is common — enable security before prod.
Minimal end-to-end example
1) Elasticsearch running on localhost:9200.
2) Logstash listens on 5044, groks nginx access lines, outputs to ES.
3) Filebeat tails /var/log/nginx/access.log, sends to Logstash.
4) Verify: curl localhost:9200/nginx-access-*/_search?size=1.
Key paths and services
- Elasticsearch —
elasticsearchservice, config in/etc/elasticsearch/ - Logstash —
logstashservice, pipelines in/etc/logstash/conf.d/ - Filebeat —
filebeatservice, config/etc/filebeat/filebeat.yml
Learning resources
- Elasticsearch guide — elastic.co — Elasticsearch
- Logstash — elastic.co — Logstash
- Filebeat — elastic.co — Filebeat
- Grok patterns — Logstash grok filter
Practice scenarios
Hands-on ELK Stack scenarios on live Linux VMs: elk