SadServers
  • Scenarios
  • Labs
    All Labs Linux & Bash Web Servers Databases Data Processing Docker Kubernetes CI/CD Infrastructure as Code Tooling / Applications
  • Dashboard
  • Solutions
    For Individuals For Businesses
  • Ranking
  • Newsletter
  • Documentation
    FAQ Support Pro Accounts Pro+ Accounts Business Accounts Gift API CLI/TUI Privacy Troubleshooting Interviews
  • Blog
  • Pricing
  • Gift
    Gift Purchase Gift Redeem
  • About
Log In - Sign Up
  1. Labs
  2. Prometheus
  3. Troubleshooting

Guide

Concepts and learning path

Troubleshooting

Failure modes and fixes

Cheatsheet

Commands to keep handy

Prometheus troubleshooting

Target down (up == 0)

Open /targets in Prometheus UI — read the error column (connection refused, timeout, 404, SSL). Verify host:port, firewall, and that the exporter listens on 0.0.0.0 not only localhost. Wrong metrics_path or Kubernetes pod not ready are frequent causes. Test: curl http://target:9100/metrics from the Prometheus host.

Grafana shows “No data”

Separate Prometheus health from Grafana config. In Grafana → Explore, run the same PromQL against the Prometheus data source. If empty there, scraping or metric names are wrong — not the dashboard. Check data source URL (http://prometheus:9090), time range, and label filters. Prometheus UI /graph should show the series first.

Scrape timeout / context deadline exceeded

Target too slow or returns huge metric payloads. Increase scrape_timeout for that job or fix the exporter. Check scrape_duration_seconds. Cardinality explosions (millions of series) slow scrapes — drop high-cardinality labels at the app or via metric_relabel_configs.

PromQL returns empty or unexpected results

Wrong metric name, labels, or using rate() on a gauge. Inspect available series: {__name__=~".+"} or Grafana metric browser. Counter needs rate(metric[5m]); gauge does not. Label matchers are case-sensitive. Recording rule name differs from raw metric.

Config reload fails / Prometheus won’t start

Run promtool check config /etc/prometheus/prometheus.yml and promtool check rules /etc/prometheus/rules/*.yml. YAML indentation errors and duplicate rule names are common. Check journal: journalctl -u prometheus -e.

Disk full / TSDB corruption

Reduce retention.time, drop noisy metrics, or add disk. Monitor prometheus_tsdb_storage_blocks_bytes. After crash, Prometheus may need time to replay WAL — watch logs. See disk volumes lab.

Alerts not firing or spamming

Rules need for: duration to avoid flapping. Check ALERTS and ALERTS_FOR_STATE metrics in Prometheus. Alertmanager routes may drop or group alerts — check alertmanager:9093 UI and amtool alert. Silence during maintenance via Alertmanager silences.

Kubernetes pods not scraped

Verify annotations prometheus.io/scrape: "true", port, and path if using classic patterns — or Prometheus Operator ServiceMonitor CRD. RBAC must let Prometheus list pods/services. Targets page shows “unknown” labels if relabeling drops needed metadata.

Debugging workflow

1. Targets and up metric

curl -s localhost:9090/api/v1/query?query=up | jq . # Or open http://localhost:9090/targets

2. Query in Prometheus, then Grafana

# Prometheus /graph — confirm series exists # Grafana → Explore — same query, same time range

3. Config and logs

promtool check config /etc/prometheus/prometheus.yml journalctl -u prometheus -n 50 --no-pager

Practice scenarios

Hands-on Prometheus scenarios on live Linux VMs: prometheus

Cheatsheet →
SadServersSadServers

Real-world Linux and DevOps scenarios for hands-on learning and technical assessment.

Uptime Robot ratio (30 days)
Product
  • Scenarios
  • For Individuals
  • For Businesses
  • Pricing
Resources
  • FAQ
  • Blog
  • Newsletter
Company
  • About Us
  • Support
  • Privacy Policy
  • Terms of Service
  • Contact
Connect With Us
info@sadservers.com

Made in Canada 🇨🇦
Updated: 2026-06-13 16:06 UTC – 2d2950a