SadServers
  • Scenarios
  • Labs
    All Labs Linux & Bash Web Servers Databases Data Processing Docker Kubernetes CI/CD Infrastructure as Code Tooling / Applications
  • Dashboard
  • Solutions
    For Individuals For Businesses
  • Ranking
  • Newsletter
  • Documentation
    FAQ Support Pro Accounts Pro+ Accounts Business Accounts Gift API CLI/TUI Privacy Troubleshooting Interviews
  • Blog
  • Pricing
  • Gift
    Gift Purchase Gift Redeem
  • About
Log In - Sign Up
  1. Labs
  2. MongoDB
  3. Troubleshooting

Guide

Concepts and learning path

Troubleshooting

Failure modes and fixes

Cheatsheet

Commands to keep handy

MongoDB troubleshooting

Connection refused

mongod not running or not listening. Check systemctl status mongod and ss -tlnp | grep 27017. Verify net.bindIp in /etc/mongod.conf — binding only to 127.0.0.1 blocks remote clients. Read logs: tail -f /var/log/mongodb/mongod.log or journalctl -u mongod -e.

Authentication failed

Wrong user, password, or auth database. Users are scoped to the database where created — authenticate against that db (often admin for admin users). Check db.getUsers() from an admin session. Ensure security.authorization: enabled matches your expectations. Connection string must include credentials when auth is on.

not primary / NotWritablePrimary

Client tried to write to a SECONDARY or during failover. Check db.hello().isWritablePrimary or rs.status() for current PRIMARY. Update connection string to include full replica set and replicaSet name. Wait for election to complete after primary failure — usually seconds. Do not hardcode the primary hostname in apps.

Replica set not initialized / REMOVED state

Single mongod without rs.initiate() is standalone — no HA. Run rs.initiate() on one member, then rs.add() for others (all must share replSetName). Members in REMOVED or DOWN state: verify hostname in rs.conf() matches what other members can resolve (use FQDNs, not localhost, across hosts).

Replication lag growing

On PRIMARY run rs.printSecondaryReplicationInfo() — check syncedTo vs primary optime. Causes: secondary disk slower than primary, network issues, heavy reads on secondary competing with replication, or large index builds on secondary. A secondary in RECOVERING is catching up — monitor until SECONDARY. Oplog too small can force full resync if a secondary falls too far behind.

Election loops / no PRIMARY

No member can achieve majority quorum. Common causes: even number of voting members without arbiter, network partition splitting the set, or members down. Need majority of voting members reachable. Check rs.status() for health: 0 members. Fix connectivity, restore down nodes, or reconfigure (carefully) with rs.reconfig() — improper reconfig can make things worse.

Disk full / WiredTiger errors

MongoDB stops accepting writes when disk is full. Check df -h on storage.dbPath. Find large collections: db.stats(), db.collection.stats(). Compact or archive old data; add TTL indexes for expiring documents. Free space before restart — corrupted shutdown may require repair (last resort).

Slow queries

Run db.collection.explain("executionStats").find(...). A COLLSCAN on a large collection needs an index. Enable the profiler (db.setProfilingLevel(1, { slowms: 100 })) or check db.currentOp() for long-running ops. Missing or wrong compound index field order is a frequent cause.

Too many connections

Hit connection limit (default often high but apps can leak). Check db.serverStatus().connections. Fix connection pooling in applications. Kill long-idle ops with db.killOp(opid) after identifying in db.currentOp().

Debugging workflow

1. Is mongod up?

systemctl status mongod ss -tlnp | grep 27017 mongosh --eval "db.adminCommand({ ping: 1 })"

2. Replica set health

mongosh --eval "rs.status()" mongosh --eval "rs.printSecondaryReplicationInfo()"

3. Active operations and slow queries

db.currentOp({ active: true, secs_running: { $gt: 5 } }) db.system.profile.find().sort({ ts: -1 }).limit(5)

Practice scenarios

Hands-on MongoDB scenarios on live Linux VMs: mongodb

Cheatsheet →
SadServersSadServers

Real-world Linux and DevOps scenarios for hands-on learning and technical assessment.

Uptime Robot ratio (30 days)
Product
  • Scenarios
  • For Individuals
  • For Businesses
  • Pricing
Resources
  • FAQ
  • Blog
  • Newsletter
Company
  • About Us
  • Support
  • Privacy Policy
  • Terms of Service
  • Contact
Connect With Us
info@sadservers.com

Made in Canada 🇨🇦
Updated: 2026-06-13 16:06 UTC – 2d2950a