SadServers
  • Scenarios
  • Labs
    All Labs Linux & Bash Web Servers Databases Data Processing Docker Kubernetes CI/CD Infrastructure as Code Tooling / Applications
  • Dashboard
  • Solutions
    For Individuals For Businesses
  • Ranking
  • Newsletter
  • Documentation
    FAQ Support Pro Accounts Pro+ Accounts Business Accounts Gift API CLI/TUI Privacy Troubleshooting Interviews
  • Blog
  • Pricing
  • Gift
    Gift Purchase Gift Redeem
  • About
Log In - Sign Up
  1. Labs
  2. Prometheus
  3. Guide

Guide

Concepts and learning path

Troubleshooting

Failure modes and fixes

Cheatsheet

Commands to keep handy

Prometheus guide

What Prometheus does in production

Prometheus is a monitoring system and time-series database. It answers questions like “what is the error rate right now?”, “which instances are out of disk?”, and “did latency spike after the deploy?” Applications expose metrics in a text format; Prometheus pulls them periodically and stores samples with labels. Alerts fire from rules; humans explore data in the UI or — most often — in Grafana.

Prometheus and Grafana

Prometheus collects, stores, and queries metrics (PromQL). Grafana connects to Prometheus as a data source and builds dashboards, graphs, and tables for visualization. Typical stack: exporters and apps → Prometheus (scrape + store) → Grafana (dashboards) → Alertmanager (notifications). You troubleshoot missing graphs in Grafana by verifying the Prometheus query and that scrapes are healthy — not only the panel config.

Pull model and exporters

Prometheus scrapes /metrics (or custom paths) on an interval (scrape_interval). Targets can be:

  • Instrumented apps — Prometheus client libraries expose metrics
  • Exporters — sidecars that translate stats to Prometheus format (e.g. node_exporter for Linux host metrics)
  • Pushgateway — for short-lived batch jobs that cannot be scraped (push, then Prometheus pulls from gateway)

Default scrape port for Prometheus itself is 9090. Check target health at Status → Targets in the Prometheus UI.

Metrics format and labels

Metrics are named counters, gauges, histograms, or summaries, with labels for dimensions: http_requests_total{method="GET",status="200"}. Labels enable aggregation in PromQL but explode storage if cardinality is too high (e.g. user ID as a label). Prefer bounded label sets.

Configuration overview

Main file: prometheus.yml (often /etc/prometheus/prometheus.yml).

  • scrape_configs — jobs, targets, intervals, relabeling
  • rule_files — alerting and recording rules
  • alerting — Alertmanager endpoints
  • remote_write / remote_read — long-term storage integrations (optional)

Reload config without restart: curl -X POST localhost:9090/-/reload (if --web.enable-lifecycle enabled) or systemctl reload prometheus.

PromQL essentials

PromQL queries time series. Examples: rate(http_requests_total[5m]) for per-second rate over 5 minutes; node_memory_MemAvailable_bytes for a gauge. Use Grafana’s Explore view to prototype PromQL before adding to dashboards. Recording rules precompute expensive queries.

Alerting

Alerting rules in Prometheus evaluate PromQL expressions; firing alerts go to Alertmanager for grouping, silencing, and routing (PagerDuty, Slack, email). Keep alerts actionable — “disk full in 4h” beats “cpu > 50%” with no context.

Service discovery

Static targets lists work for small setups. Production often uses Kubernetes SD, file_sd, or cloud SD so new pods are scraped automatically. Relabeling drops or rewrites labels before ingest. See the Kubernetes lab when pod targets stay down.

Storage and retention

Prometheus stores blocks on local disk (TSDB). Retention defaults (~15 days) is controlled by --storage.tsdb.retention.time. High-cardinality metrics or long retention fills disk fast — monitor Prometheus’s own metrics and disk.

Key paths and service

  • Config — /etc/prometheus/prometheus.yml
  • Data — /var/lib/prometheus/
  • Rules — /etc/prometheus/rules/*.yml
  • Service — prometheus (systemd)
  • Grafana — separate service (often port 3000); add Prometheus URL as data source

Learning resources

  • Prometheus documentation — prometheus.io/docs
  • PromQL — Querying basics
  • Grafana — Prometheus data source — grafana.com — Prometheus
  • Node exporter — node_exporter
  • Alertmanager — Alertmanager docs

Practice scenarios

Hands-on Prometheus scenarios on live Linux VMs: prometheus

Troubleshooting →
SadServersSadServers

Real-world Linux and DevOps scenarios for hands-on learning and technical assessment.

Uptime Robot ratio (30 days)
Product
  • Scenarios
  • For Individuals
  • For Businesses
  • Pricing
Resources
  • FAQ
  • Blog
  • Newsletter
Company
  • About Us
  • Support
  • Privacy Policy
  • Terms of Service
  • Contact
Connect With Us
info@sadservers.com

Made in Canada 🇨🇦
Updated: 2026-06-13 16:06 UTC – 2d2950a