RabbitMQ troubleshooting
Connection refused (port 5672)
Broker not running or firewall blocks AMQP. Check
systemctl status rabbitmq-server and
ss -tlnp | grep 5672. Startup failures often show in
/var/log/rabbitmq/ — hostname resolution, cookie permissions,
or port in use. Wait for rabbitmq-diagnostics check_running to pass
after boot.
ACCESS_REFUSED / authentication failed
Wrong user/password or insufficient vhost permissions. Remote clients cannot
use default guest (localhost only). Create user and permissions:
rabbitmqctl set_permissions -p VHOST USER ".*" ".*" ".*". Verify
vhost in connection URL matches (amqp://user:pass@host/vhost).
Queue keeps growing (backlog)
Primary production alert. In the management dashboard (Queues tab),
watch Ready count rise over time. CLI:
rabbitmqctl list_queues name messages consumers messages_ready.
Zero consumers → start workers. Consumers present but backlog grows → scale
consumers, speed up processing, or reduce publish rate. Check publish vs deliver
rates on Overview. Poison messages requeued forever → use DLX and fix handler.
High unacknowledged messages
Messages delivered to consumers but not acked — visible as
Unacked in the dashboard. Causes: slow processing, deadlock,
prefetch too high piling work on one consumer, or consumer crash without
connection close. Lower consumer prefetch, fix handler timeouts,
ensure ack/nack in finally blocks. Restart stuck consumers after
investigating logs.
Memory or disk alarm (flow control)
Broker blocks publishers when memory or free disk drops below watermark.
rabbitmqctl status and rabbitmq-diagnostics check_alarms.
Often caused by huge queues (messages paged to disk). Fix the backlog, add
policies (max-length, TTL), purge test queues if safe, or add
RAM/disk. Dashboard shows red alarms on Overview.
Messages published but not reaching queue
Routing misconfiguration — wrong exchange, routing key, or missing binding.
In dashboard: Exchanges → bindings. Use tracing or publish a test message.
mandatory publish with returns enabled catches unroutable messages.
Classic mistake: publishing to default exchange without correct queue name as key.
Management UI not loading (15672)
Enable plugin: rabbitmq-plugins enable rabbitmq_management and
restart if needed. Check ss -tlnp | grep 15672 and firewall.
UI uses same user database — fix auth same as AMQP. For production, restrict
15672 to admin networks or put behind reverse proxy with TLS.
Cluster node split / queues unavailable
Classic queue on a down node is unavailable until recovery. Check
rabbitmqctl cluster_status. Quorum queues survive node loss
better if majority intact. Do not wipe /var/lib/rabbitmq/mnesia
without understanding cluster state — can cause split-brain data loss.
Too many connections / channels
Connection leaks in apps — each connection should be long-lived; channels are
cheap but unbounded channels hurt. Dashboard → Connections. Close stale:
rabbitmqctl list_connections pid port. Use connection pooling in
client libraries; one connection per process is a common pattern.
Debugging workflow
1. Dashboard overview
# http://HOST:15672 → Overview (rates, alarms)
# Queues tab → sort by Ready count, check Consumers column2. CLI snapshot
rabbitmqctl status
rabbitmqctl list_queues name messages messages_ready messages_unacknowledged consumers
rabbitmqctl list_consumers3. Logs and resource alarms
rabbitmq-diagnostics check_alarms
rabbitmq-diagnostics memory_breakdown
tail -100 /var/log/rabbitmq/rabbit@*.logPractice scenarios
Hands-on RabbitMQ scenarios on live Linux VMs: rabbitmq