MongoDB guide
What MongoDB does in production
MongoDB stores data as documents (BSON) in collections within databases. Schema is flexible — fields can vary between documents. It suits workloads with evolving structures, high write throughput, and horizontal scaling via sharding. The default storage engine is WiredTiger.
- Application data — user profiles, catalogs, content management
- Event and log ingestion — time-series and high-volume writes (with TTL indexes)
- Mobile and IoT backends — document model maps well to JSON APIs
- Analytics pipelines — aggregation framework for in-database transforms
How a client connection works
When an application connects, the typical workflow is:
- TCP connect — client reaches
host:27017(or replica set seed list) - Handshake — driver negotiates wire protocol; replica set discovery if connection string includes multiple hosts
- Authentication — SCRAM-SHA-256 against
adminor target database user - Operation — CRUD, aggregations, indexes; writes go to the PRIMARY in a replica set
- Response — results return; connection pooling is standard in drivers
Connection strings look like
mongodb://user:pass@host1:27017,host2:27017/mydb?replicaSet=rs0.
Always specify replicaSet when connecting to a replica set so the driver
can fail over after elections.
Key files and configuration
/etc/mongod.conf— main config (YAML on modern installs)storage.dbPath— data directory (e.g./var/lib/mongodb)net.bindIp— interfaces to listen on;127.0.0.1vs0.0.0.0systemLog.path— log file locationreplication.replSetName— replica set name (required for HA)security.authorization— enable auth (enabled)
Users, roles, and permissions
MongoDB uses role-based access control. Create users in the database they access
(or admin for cluster admin). Use least privilege — application users
should not have root or dbOwner on unrelated databases.
After creating the first user with auth enabled, all connections require credentials.
Indexes and performance
Queries without supporting indexes scan entire collections — slow and CPU-heavy.
Use db.collection.createIndex() and verify with
explain("executionStats"). The _id field is indexed
automatically. Compound indexes matter for multi-field queries; order of fields
in the index must match query patterns.
Replication
MongoDB achieves high availability with replica sets — a group of
mongod instances that maintain the same data set. One member is
PRIMARY (accepts writes); others are SECONDARY
(replicate the oplog, can serve reads). An optional ARBITER
votes in elections but holds no data.
Replica set basics
Each replica set member has a unique _id and hostname in
rs.conf(). Changes replicate via the oplog
(capped collection local.oplog.rs) — an ordered log of operations.
Replication is asynchronous by default; lag is normal under load.
Minimum production topology: three voting members (three data-bearing nodes, or two data nodes + one arbiter — arbiters provide quorum only, not data redundancy). Even number of data nodes without an arbiter risks split-brain during network partitions.
Basic setup:
- Same
replSetNameinmongod.confon every member - Start all members, connect to one:
rs.initiate()thenrs.add("host2:27017") - Or define members in the initiate config document upfront
- Create users after initiating; enable
authorizationonce users exist
Failover: when the PRIMARY is unreachable, members hold an election. A new PRIMARY is elected (majority quorum required). Drivers with correct replica set connection strings reconnect automatically. Downtime is typically seconds.
Read preferences: default reads from PRIMARY. Use
secondary or secondaryPreferred to offload reads —
beware replication lag (stale reads). Write concern controls
durability: { w: "majority" } waits for majority of members before
acknowledging (stronger than default w: 1).
Monitor with rs.status(), rs.printReplicationInfo(), and
rs.printSecondaryReplicationInfo().
Sharding (brief)
When a single replica set exceeds one server's capacity, sharding
splits data across shards by shard key. A mongos router directs queries;
config servers hold metadata. Each shard is typically its own replica set. Sharding
adds operational complexity — only adopt when a single replica set cannot scale.
Backups
A secondary replica is not a backup — db.dropDatabase() on the PRIMARY
replicates everywhere. Maintain independent backups and test restores.
Logical backup (mongodump) — exports BSON/JSON per
collection. Simple and portable; use --oplog on a replica set member
for point-in-time consistency. Restore with mongorestore.
Filesystem snapshot — copy WiredTiger files from a SECONDARY after
db.fsyncLock() (brief write pause) or use cloud volume snapshots on
a secondary. Faster for large datasets; version-sensitive.
Cloud / Ops Manager — Atlas and MongoDB Ops Manager provide continuous backup and PITR. Know retention and restore procedures for managed deployments.
Learning resources
- MongoDB documentation — mongodb.com/docs
- Replica sets — mongodb.com/docs — replication
- Security — mongodb.com/docs — security
- Backup methods — mongodb.com/docs — backups
- Performance — mongodb.com/docs — performance
Practice scenarios
Hands-on MongoDB scenarios on live Linux VMs: mongodb