ELK stack troubleshooting
Filebeat not shipping logs
Check filebeat test config and filebeat test output.
Wrong file paths or permissions (Filebeat runs as root or
filebeat user — must read log files). Registry corruption can skip
or repeat lines — inspect /var/lib/filebeat/registry. Verify Logstash
listens on 5044: ss -tlnp | grep 5044. Firewall between hosts blocks
Beats silently — test with telnet logstash-host 5044 or
nc -zv.
Logstash pipeline not starting
Run /usr/share/logstash/bin/logstash -t for syntax errors.
Grok pattern mismatch does not stop the pipeline but leaves unparsed
message fields — look for _grokparsefailure in events.
JVM heap too small causes OOM — tune -Xms/-Xmx in
/etc/logstash/jvm.options. Check
journalctl -u logstash -e for plugin or permission errors on
pipeline config files.
Elasticsearch cluster red or yellow
curl localhost:9200/_cluster/health?pretty — red means primary
shards missing (investigate immediately). Yellow on a single node
is expected: replica shards cannot assign without a second node. Fix prod yellow
by adding nodes or setting number_of_replicas: 0 on indices (dev
only). Use _cat/shards?v and
_cluster/allocation/explain for stuck shards. Disk watermark
triggers read-only mode — see disk full below.
No documents in Elasticsearch
Trace the pipeline: Filebeat → Logstash → ES. Confirm Filebeat registry
advancing and Logstash receiving events (temporary stdout output).
Wrong index name in Logstash output vs your search query. Mapping conflicts
reject documents — check ES logs for
mapper_parsing_exception. Timestamp in the future puts docs outside
your search time window. Refresh: indices are near-real-time —
/_refresh or wait ~1s after index.
Disk full / flood stage watermark
Elasticsearch blocks writes when disk crosses flood-stage watermark. Delete old
indices, add disk, or adjust watermarks in elasticsearch.yml (not
a long-term fix). Implement ILM or cron to drop indices older than retention.
Monitor /_cat/allocation?v and host disk with
df -h. See the
disk volumes lab.
Authentication / TLS errors
With security enabled, Filebeat and Logstash need
username/password or API keys in output blocks, plus
CA cert for TLS. Errors like 401 Unauthorized or
certificate verify failed in beat/logstash logs. Test ES auth:
curl -u elastic:PASS localhost:9200. Align cipher suites and
hostname verification between components.
Grok / parsing wrong fields
Log format changed (nginx custom format, JSON logs) — update grok pattern or
switch to json { source => "message" } filter. Use Grok Debugger
(Elastic docs) or stdout { codec => rubydebug } to inspect raw
events. Multiline stack traces need
codec => multiline in Filebeat or Logstash multiline filter.
Elasticsearch won’t start
Common causes: insufficient vm.max_map_count on Linux (set to
262144+), wrong Java version, corrupt data path, or port 9200 already in use.
Logs: /var/log/elasticsearch/ or
journalctl -u elasticsearch -e. Bootstrap checks fail on memory
locking and file descriptors — follow hints in the error message.
Debugging workflow
1. Cluster health and indices
curl -s localhost:9200/_cluster/health?pretty
curl -s localhost:9200/_cat/indices?v | head2. Filebeat → Logstash path
filebeat test output
journalctl -u filebeat -n 30 --no-pager
ss -tlnp | grep 50443. Logstash pipeline and ES output
/usr/share/logstash/bin/logstash -t
journalctl -u logstash -n 50 --no-pager
curl -s 'localhost:9200/logs-*/_count'Practice scenarios
Hands-on ELK Stack scenarios on live Linux VMs: elk