SadServers
  • Scenarios
  • Labs
    All Labs Linux & Bash Web Servers Databases Data Processing Docker Kubernetes CI/CD Infrastructure As Code Observability Tooling / Applications
  • Dashboard
  • Solutions
    For Individuals For Businesses
  • Ranking
  • Newsletter
  • Documentation
    FAQ Support Pro Accounts Pro+ Accounts Business Accounts Gift API CLI/TUI Privacy Troubleshooting Interviews
  • Blog
  • Pricing
  • Gift
    Gift Purchase Gift Redeem
  • About
Log In - Sign Up
  1. Labs
  2. ELK Stack
  3. Troubleshooting

Guide

Concepts and learning path

Troubleshooting

Failure modes and fixes

Cheatsheet

Commands to keep handy

ELK stack troubleshooting

Filebeat not shipping logs

Check filebeat test config and filebeat test output. Wrong file paths or permissions (Filebeat runs as root or filebeat user — must read log files). Registry corruption can skip or repeat lines — inspect /var/lib/filebeat/registry. Verify Logstash listens on 5044: ss -tlnp | grep 5044. Firewall between hosts blocks Beats silently — test with telnet logstash-host 5044 or nc -zv.

Logstash pipeline not starting

Run /usr/share/logstash/bin/logstash -t for syntax errors. Grok pattern mismatch does not stop the pipeline but leaves unparsed message fields — look for _grokparsefailure in events. JVM heap too small causes OOM — tune -Xms/-Xmx in /etc/logstash/jvm.options. Check journalctl -u logstash -e for plugin or permission errors on pipeline config files.

Elasticsearch cluster red or yellow

curl localhost:9200/_cluster/health?pretty — red means primary shards missing (investigate immediately). Yellow on a single node is expected: replica shards cannot assign without a second node. Fix prod yellow by adding nodes or setting number_of_replicas: 0 on indices (dev only). Use _cat/shards?v and _cluster/allocation/explain for stuck shards. Disk watermark triggers read-only mode — see disk full below.

No documents in Elasticsearch

Trace the pipeline: Filebeat → Logstash → ES. Confirm Filebeat registry advancing and Logstash receiving events (temporary stdout output). Wrong index name in Logstash output vs your search query. Mapping conflicts reject documents — check ES logs for mapper_parsing_exception. Timestamp in the future puts docs outside your search time window. Refresh: indices are near-real-time — /_refresh or wait ~1s after index.

Disk full / flood stage watermark

Elasticsearch blocks writes when disk crosses flood-stage watermark. Delete old indices, add disk, or adjust watermarks in elasticsearch.yml (not a long-term fix). Implement ILM or cron to drop indices older than retention. Monitor /_cat/allocation?v and host disk with df -h. See the disk volumes lab.

Authentication / TLS errors

With security enabled, Filebeat and Logstash need username/password or API keys in output blocks, plus CA cert for TLS. Errors like 401 Unauthorized or certificate verify failed in beat/logstash logs. Test ES auth: curl -u elastic:PASS localhost:9200. Align cipher suites and hostname verification between components.

Grok / parsing wrong fields

Log format changed (nginx custom format, JSON logs) — update grok pattern or switch to json { source => "message" } filter. Use Grok Debugger (Elastic docs) or stdout { codec => rubydebug } to inspect raw events. Multiline stack traces need codec => multiline in Filebeat or Logstash multiline filter.

Elasticsearch won’t start

Common causes: insufficient vm.max_map_count on Linux (set to 262144+), wrong Java version, corrupt data path, or port 9200 already in use. Logs: /var/log/elasticsearch/ or journalctl -u elasticsearch -e. Bootstrap checks fail on memory locking and file descriptors — follow hints in the error message.

Debugging workflow

1. Cluster health and indices

curl -s localhost:9200/_cluster/health?pretty curl -s localhost:9200/_cat/indices?v | head

2. Filebeat → Logstash path

filebeat test output journalctl -u filebeat -n 30 --no-pager ss -tlnp | grep 5044

3. Logstash pipeline and ES output

/usr/share/logstash/bin/logstash -t journalctl -u logstash -n 50 --no-pager curl -s 'localhost:9200/logs-*/_count'

Practice scenarios

Hands-on ELK Stack scenarios on live Linux VMs: elk

Cheatsheet →
SadServersSadServers

Real-world Linux and DevOps scenarios for hands-on learning and technical assessment.

Uptime Robot ratio (30 days)
Product
  • Scenarios
  • For Individuals
  • For Businesses
  • Pricing
Resources
  • FAQ
  • Blog
  • Newsletter
Company
  • About Us
  • Support
  • Privacy Policy
  • Terms of Service
  • Contact
Connect With Us
info@sadservers.com

Made in Canada 🇨🇦
Updated: 2026-06-26 23:27 UTC – f0e2403