MongoDB guide

What MongoDB does in production

MongoDB stores data as documents (BSON) in collections within databases. Schema is flexible — fields can vary between documents. It suits workloads with evolving structures, high write throughput, and horizontal scaling via sharding. The default storage engine is WiredTiger.

Application data — user profiles, catalogs, content management
Event and log ingestion — time-series and high-volume writes (with TTL indexes)
Mobile and IoT backends — document model maps well to JSON APIs
Analytics pipelines — aggregation framework for in-database transforms

How a client connection works

When an application connects, the typical workflow is:

TCP connect — client reaches host:27017 (or replica set seed list)
Handshake — driver negotiates wire protocol; replica set discovery if connection string includes multiple hosts
Authentication — SCRAM-SHA-256 against admin or target database user
Operation — CRUD, aggregations, indexes; writes go to the PRIMARY in a replica set
Response — results return; connection pooling is standard in drivers

Connection strings look like mongodb://user:pass@host1:27017,host2:27017/mydb?replicaSet=rs0. Always specify replicaSet when connecting to a replica set so the driver can fail over after elections.

Key files and configuration

/etc/mongod.conf — main config (YAML on modern installs)
storage.dbPath — data directory (e.g. /var/lib/mongodb)
net.bindIp — interfaces to listen on; 127.0.0.1 vs 0.0.0.0
systemLog.path — log file location
replication.replSetName — replica set name (required for HA)
security.authorization — enable auth (enabled)

Users, roles, and permissions

MongoDB uses role-based access control. Create users in the database they access (or admin for cluster admin). Use least privilege — application users should not have root or dbOwner on unrelated databases. After creating the first user with auth enabled, all connections require credentials.

Indexes and performance

Queries without supporting indexes scan entire collections — slow and CPU-heavy. Use db.collection.createIndex() and verify with explain("executionStats"). The _id field is indexed automatically. Compound indexes matter for multi-field queries; order of fields in the index must match query patterns.

Replication

MongoDB achieves high availability with replica sets — a group of mongod instances that maintain the same data set. One member is PRIMARY (accepts writes); others are SECONDARY (replicate the oplog, can serve reads). An optional ARBITER votes in elections but holds no data.

Replica set basics

Each replica set member has a unique _id and hostname in rs.conf(). Changes replicate via the oplog (capped collection local.oplog.rs) — an ordered log of operations. Replication is asynchronous by default; lag is normal under load.

Minimum production topology: three voting members (three data-bearing nodes, or two data nodes + one arbiter — arbiters provide quorum only, not data redundancy). Even number of data nodes without an arbiter risks split-brain during network partitions.

Basic setup:

Same replSetName in mongod.conf on every member
Start all members, connect to one: rs.initiate() then rs.add("host2:27017")
Or define members in the initiate config document upfront
Create users after initiating; enable authorization once users exist

Failover: when the PRIMARY is unreachable, members hold an election. A new PRIMARY is elected (majority quorum required). Drivers with correct replica set connection strings reconnect automatically. Downtime is typically seconds.

Read preferences: default reads from PRIMARY. Use secondary or secondaryPreferred to offload reads — beware replication lag (stale reads). Write concern controls durability: { w: "majority" } waits for majority of members before acknowledging (stronger than default w: 1).

Monitor with rs.status(), rs.printReplicationInfo(), and rs.printSecondaryReplicationInfo().

Sharding (brief)

When a single replica set exceeds one server's capacity, sharding splits data across shards by shard key. A mongos router directs queries; config servers hold metadata. Each shard is typically its own replica set. Sharding adds operational complexity — only adopt when a single replica set cannot scale.

Backups

A secondary replica is not a backup — db.dropDatabase() on the PRIMARY replicates everywhere. Maintain independent backups and test restores.

Logical backup (mongodump) — exports BSON/JSON per collection. Simple and portable; use --oplog on a replica set member for point-in-time consistency. Restore with mongorestore.

Filesystem snapshot — copy WiredTiger files from a SECONDARY after db.fsyncLock() (brief write pause) or use cloud volume snapshots on a secondary. Faster for large datasets; version-sensitive.

Cloud / Ops Manager — Atlas and MongoDB Ops Manager provide continuous backup and PITR. Know retention and restore procedures for managed deployments.

Learning resources

MongoDB documentation — mongodb.com/docs
Replica sets — mongodb.com/docs — replication
Security — mongodb.com/docs — security
Backup methods — mongodb.com/docs — backups
Performance — mongodb.com/docs — performance

Practice scenarios

Hands-on MongoDB scenarios on live Linux VMs: mongodb

Troubleshooting →