ELK stack guide

What the stack does in production

Applications write logs to files or stdout. You need to collect those events, normalize them (timestamps, fields, grok patterns), and store them so operators can search during incidents. The classic Elastic path (without Kibana in this lab):

Filebeat — lightweight agent on each host; tails log files and forwards events
Logstash — central pipeline: input → filter → output
Elasticsearch — distributed search engine and document store (indices)

Typical flow: app logs → Filebeat → Logstash → Elasticsearch. Filebeat can also send directly to Elasticsearch for simpler setups, but Logstash handles heavy parsing and enrichment. The same way Logstash can also send directly from the application but if Logstash goes down or crashes it can degrade the performance of the application and may drop logs entirely if its internal memory buffer overflows.

Filebeat — the shipper

Filebeat watches files listed in filebeat.yml under filebeat.inputs. It tracks read position in a registry file so restarts do not re-ship old lines (unless configured otherwise). Outputs are usually Logstash (output.logstash) or Elasticsearch (output.elasticsearch).

Config — /etc/filebeat/filebeat.yml
Registry — /var/lib/filebeat/registry (state)
Modules — prebuilt configs for nginx, apache, system logs, etc.
Service — filebeat (systemd)

Enable a module: filebeat modules enable nginx, then filebeat setup if writing to Elasticsearch (index templates).

Logstash — the pipeline

Logstash runs a pipeline defined in /etc/logstash/conf.d/*.conf (or a single logstash.conf). Three stages:

input — beats (Filebeat), syslog, kafka, file, etc.
filter — grok, mutate, date, geoip, drop unwanted events
output — elasticsearch, stdout (debug), kafka, etc.

Default Beats input listens on port 5044. Test pipelines with stdout { codec => rubydebug } before sending to Elasticsearch. Reload: systemctl restart logstash (or hot reload if enabled).

Elasticsearch — the store

Elasticsearch stores JSON documents in indices (e.g. nginx-access-2024.06.24). Shards spread data across nodes; replicas provide redundancy. Cluster health: green (all good), yellow (replicas unassigned on single-node), red (primary shards missing — data loss risk).

HTTP API — port 9200 (REST queries and admin)
Transport — port 9300 (node-to-node; not for clients)
Config — /etc/elasticsearch/elasticsearch.yml
Data — /var/lib/elasticsearch/
Logs — /var/log/elasticsearch/

Query from the shell with curl and the Query DSL, or use _search for full-text search. Index Lifecycle Management (ILM) rotates and deletes old indices to control disk use.

Index naming and daily indices

Log pipelines often use daily indices: logs-%{+YYYY.MM.dd} in Logstash output. That makes retention easy (delete indices older than N days) and keeps shard sizes manageable. Filebeat modules and Elastic integrations ship index templates that map fields correctly for search and aggregations.

Security basics

Production clusters enable TLS and authentication (native users, API keys, or LDAP). Filebeat and Logstash need credentials in their output blocks when security is on. Never expose port 9200 to the public internet without auth. For local labs, xpack.security.enabled: false is common — enable security before prod.

Minimal end-to-end example

1) Elasticsearch running on localhost:9200. 2) Logstash listens on 5044, groks nginx access lines, outputs to ES. 3) Filebeat tails /var/log/nginx/access.log, sends to Logstash. 4) Verify: curl localhost:9200/nginx-access-*/_search?size=1.

Key paths and services

Elasticsearch — elasticsearch service, config in /etc/elasticsearch/
Logstash — logstash service, pipelines in /etc/logstash/conf.d/
Filebeat — filebeat service, config /etc/filebeat/filebeat.yml

Learning resources

Elasticsearch guide — elastic.co — Elasticsearch
Logstash — elastic.co — Logstash
Filebeat — elastic.co — Filebeat
Grok patterns — Logstash grok filter

Practice scenarios

Hands-on ELK Stack scenarios on live Linux VMs: elk

Troubleshooting →