MongoDB Failover and Effects

These scenarios highlight the different ways a MongoDB replica set can be stressed or reconfigured to demonstrate its high-availability behavior. You can see how the cluster responds when the primary is killed abruptly, how election rules allow a higher-priority node to regain leadership, and how using the primaryPreferred read preference keeps queries flowing during failover. Network isolation and container termination tests simulate real outages, while adding an analytics node or removing a member shows how the replica set can adapt to changing workloads. Together, they provide a practical tour of resilience, election dynamics, and scaling options.

1. Prequisites

Install Docker Desktop. For MongoDB employees, request a Docker license from the Lumos app via corp.mongodb.com.

2. Docker Compose (Recommended)

2a. Setup Processes

In one terminal, start all up all the services and initiate the replica set:

# (terminal 1) Start all containers
$ docker compose up -d
[+] Running 5/5
 ✔ Container mongo1     Healthy
 ✔ Container mongo2     Healthy
 ✔ Container analytics  Healthy
 ✔ Container mongo0     Healthy
 ✔ Container app0       Started

# (terminal 1) Initiate the replica set
$ docker compose exec mongo0 bash
$ mongosh --file /scripts/init-rs.js
Initiating replica set...
Replica set initiated successfully
$ mongosh --file /scripts/summary.js
exit

Open and review the application code in the app.js file. Explain the nature of the app. Ensure to cover the MongoClient, the read and write queries.

In another terminal, start the application process:

# (terminal 2) Start the app
$ docker compose exec app0 bash -c "npm start"
> replica-set-tester@1.0.0 start
> node app.js

[2025-08-27T14:51:23.515Z] Current value: 603
[2025-08-27T14:51:24.011Z] Incremented
[2025-08-27T14:51:24.018Z] Current value: 604
[2025-08-27T14:51:24.515Z] Current value: 604

Note, you'll come back and monitor the app terminal as you complete various demos.

2b. Run each demo

DEMO 1: Failover when NICELY killing `primary` process

Gracefully stopping the primary allows the replica set to detect the shutdown and quickly elect a new primary. Client applications continue with minimal or no interruption.

🎬🎬🎬 Click to expand the section and see the commands.

Kill the primary:

# (terminal 1)
KILL_NODE=mongo0 # update this variable with the *current* primary
$ docker compose exec ${KILL_NODE} bash
$ mongosh --file scripts/summary.js # optional, re-confirm this is the primary
$ ps -ef  # optional
$ pidof mongod # optional
$ kill $(pidof mongod)
$ ps -ef  # optional, check is dead
$ exit

Observe the app wasn't interrupted in terminal 2.

Check the replica set from a running node:

# (terminal 1)
RUNNING_NODE=mongo1  # update this variable with a running node
$ docker compose exec ${RUNNING_NODE} mongosh --file scripts/summary.js

Start the previously killed mongod:

# (terminal 1)
$ docker compose exec ${KILL_NODE} bash
$ mongod --config /etc/mongod.conf --replSet mongodb-repl-set --fork

# observe the node has rejoined the cluster
$ mongosh --file scripts/summary.js
$ exit

DEMO 2: Failover when HARD killing `primary` process

Using kill -9 abruptly terminates the primary without cleanup, causing a slightly longer election. Applications pause briefly for writes but resume once a new primary is chosen.

🎬🎬🎬 Click to expand the section and see the commands.

Kill the primary:

# (terminal 1)
KILL_NODE=mongo1 # update this variable with the *current* primary
$ docker compose exec ${KILL_NODE} bash
$ mongosh --file scripts/summary.js # optional, re-confirm this is the primary
$ ps -ef  # optional
$ pidof mongod # optional
$ kill -9 $(pidof mongod)
$ ps -ef  # optional, check is dead
$ exit

In terminal 2, observe that the output from the app halts for a few seconds and then continues from where it left off, there are no errors reported by the application. Note, the reads were also paused. This will be addressed in an upcoming demo.

Check the replica set from a running node:

# (terminal 1)
RUNNING_NODE=mongo2  # update this variable with a running node
$ docker compose exec ${RUNNING_NODE} mongosh --file scripts/summary.js

Start the previously killed mongod:

# (terminal 1)
$ docker compose exec ${KILL_NODE} bash
$ mongod --config /etc/mongod.conf --replSet mongodb-repl-set --fork

# observe the node has rejoined the cluster
$ mongosh --file scripts/summary.js
$ exit

DEMO 3: Higher Priority Node becomes `primary`

By setting replica set priorities, a designated node can reclaim the primary role after it restarts. This ensures leadership is assigned to preferred infrastructure.

🎬🎬🎬 Click to expand the section and see the commands.

Set mongo2 to have a higher priority. Note we also reduce electionTimeoutMillis in demos to quickly show a new primary being elected. In prod, a very low number can trigger false elections more often, which can reduce stability. It determines how long should a secondary node wait to trigger an election because it assumes the primary is unreachable.

# (terminal 1)
$ P10_NODE=mongo2
$ docker compose exec ${P10_NODE} bash
$ mongosh ${MONGODB_URI} --quiet
$   config = rs.conf();
$   config.members[2].host; // optional, confirm this is mongo2
$   config.members[2].priority = 10;
$   config.settings.electionTimeoutMillis = 1000;  // Lower to 1 second
$   rs.reconfig(config);
$   exit;

Observe that your P10_NODE has been elected to primary.

# (terminal 1)
$ docker compose exec ${P10_NODE} mongosh --file scripts/summary.js

Kill the primary:

# (terminal 1)
$ docker compose exec ${P10_NODE} pkill -9 mongod

In terminal 2, observe that the output from the app halts for a much shorter time compared the preceeding demo and then continues from where it left off, there are no errors reported by the application.

Check the replica set from a running node:

# (terminal 1)
RUNNING_NODE=mongo0  # update this variable with a running node
$ docker compose exec ${RUNNING_NODE} mongosh --file scripts/summary.js

Start the previously killed mongod:

# (terminal 1)
$ docker compose exec ${P10_NODE} bash

$ mongod --config /etc/mongod.conf --replSet mongodb-repl-set --fork


# observe the node has rejoined the cluster as `primary`

Set electionTimeoutMillis to 5 seconds:

$ mongosh ${MONGODB_URI} --quiet
$  config = rs.conf();
$  config.settings.electionTimeoutMillis = 5000;
$  rs.reconfig(config);
$  exit
$ exit

DEMO 4: Change the query options so that reads aren't delayed when the `primary` fails

In this demo we'll use a primaryPreferred Read Preference. With primaryPreferred, reads can fall back to secondaries during primary downtime. This keeps queries flowing even if writes are briefly blocked.

Note: The primaryPreferred option can be set at the connection string/driver level or on a per-query basis.

🎬🎬🎬 Click to expand the section and see the commands.

In terminal 2, stop the application (ctrl-c)

Edit app.js to include the primaryPreferred read preference:

const col = db.collection("counter", { readPreference: ReadPreference.primaryPreferred });

Restart the application docker compose exec app0 bash -c "npm start"
Kill the primary docker compose exec ${P10_NODE} pkill -9 mongod
Observe that the reads continue, but the counter is not incremented for a few seconds.

Verify P10_NODE is unreachable.

 $ RUNNING_NODE=mongo0
 $ docker compose exec ${RUNNING_NODE} mongosh --file scripts/summary.js

Restart mongod docker compose exec ${P10_NODE} mongod --config /etc/mongod.conf --fork

DEMO 5: Isolate the `primary` node from the network

Network isolation simulates a partition, triggering the remaining members to elect a new primary. Once reconnected, the isolated node rejoins as a secondary.

🎬🎬🎬 Click to expand the section and see the commands.

P10_NODE should still be the primary as it has the highest priority; isolate it from the Docker network
```
docker network disconnect mongo-net ${P10_NODE}
```
You may spot a Increment error: connect ECONNREFUSED 127.0.0.1:27017 Error on the app, that's expected.

Confirm P10_NODE is not a functioning member of the replica set. It may take up to electionTimeoutMillis to see this change.

$ RUNNING_NODE=mongo0 # update accordingly
$ docker compose exec ${RUNNING_NODE} mongosh --file scripts/summary.js
$ docker compose exec ${P10_NODE} mongosh --file scripts/summary.js

Confirm writes are rejected on P10_NODE

$ docker compose exec ${P10_NODE} mongosh --eval "db.fluff.insertOne({})"
MongoServerError: not primary

Add P10_NODE back to the network docker network connect mongo-net ${P10_NODE}

Confirm P10_NODE is reelected to be primary

docker compose exec ${P10_NODE} mongosh --file scripts/summary.js

DEMO 6: Kill the docker container

Force-killing the primary MongoDB container simulates a crash, causing failover and re-election. The application may see a short write pause before resuming.

🎬🎬🎬 Click to expand the section and see the commands.

Kill the P10_NODE container docker compose kill ${P10_NODE}
Note from the app output that writes are paused during the failover/election.
Restart the container docker compose restart ${P10_NODE}
Restart mongod docker compose exec ${P10_NODE} mongod --config /etc/mongod.conf --fork

Observe that P10_NODE rejoins the replica set && is reelected primary

$ docker compose exec ${P10_NODE} mongosh --file scripts/summary.js

DEMO 7: Add an analytics node (if not using Atlas)

Adding an analytics node with priority 0 and role tags routes reporting or BI workloads to it, offloading queries from the main replica set. This improves performance for operational traffic.

🎬🎬🎬 Click to expand the section and see the commands.

Add the analytics node to the replica set

$ docker compose exec ${P10_NODE} mongosh ${MONGODB_URI} --quiet
  rs.add({
    host: "analytics:27017",
    priority: 0, // can never be primary
    tags: { role: "analytics" }
  });
$ exit

Uncomment the analytics thread code (DEMO 7) in app.js
Restart the app

DEMO 8: Remove a node

Removing a member reduces fault tolerance but keeps the replica set functional if a majority remains. It’s useful for scaling down or node maintenance.

🎬🎬🎬 Click to expand the section and see the commands.

\

$ docker compose exec mongo0
$ mongosh ${MONGODB_URI}
  rs.remove("analytics:27017");
$ exit

2. Docker (Alternative)

2a. Create a Docker network:

docker network create mongo-net

To be done first time or whenever there's a new version of the docker image

Delete any existing containers and images for this demo
Fetch the latest Docker image:

docker pull andrewmorgan818/mongodb-replication-demo:latest

Create the containers and connect them to our Docker network

docker run -dit \
  --name mongo0 \
  --hostname mongo0 \
  --network mongo-net \
  andrewmorgan818/mongodb-replication-demo bash

docker run -dit \
  --name mongo1 \
  --hostname mongo1 \
  --network mongo-net \
  andrewmorgan818/mongodb-replication-demo bash

docker run -dit \
  --name mongo2 \
  --hostname mongo2 \
  --network mongo-net \
  andrewmorgan818/mongodb-replication-demo bash

docker run -dit \
  --name analytics \
  --hostname analytics \
  --network mongo-net \
  andrewmorgan818/mongodb-replication-demo bash

docker run -dit \
  --name app0 \
  --hostname app0 \
  --network mongo-net andrewmorgan818/mongodb-replication-demo bash

On-site, before the demo

Start the containers from Docker Desktop
Connect a seperate terminal tab to each of the nodes:

docker exec -it mongo0 bash

docker exec -it mongo1 bash

docker exec -it mongo2 bash

docker exec -it analytics bash

docker exec -it app0 bash

Start the mongod process on mongo0, mongo1, mongo2, and analytics:

mongod --config /etc/mongod.conf&

Connect VS Code to app0:

Execute (command-ctrl-p) Dev Containers: Attach to Running Container:

Connect to app0:

Running the HA demo

Setting the scene

On mongo0, initiate the replica set:

mongosh

rs.initiate(
  {
     _id: "mongodb-repl-set",
     version: 1,
     members: [
        { _id: 0, host : "mongo0" },
        { _id: 1, host : "mongo1" },
        { _id: 2, host : "mongo2" }
     ]
  }
)

quit

Confirm that the replica set is up and running:

mongosh "mongodb://mongo0:27017,mongo1:27017,mongo2:27017/?authSource=admin&replicaSet=mongodb-repl-set"

function rsSummary() {
  const config = rs.config();
  return rs.status().members.map((m, i) => ({
    name: m.name,
    stateStr: m.stateStr,
    health: m.health,
    priority: config.members[i].priority
  }));
}

rsSummary()

Show the demo app code in /home/src/mongo-repl-test/app.js
Run the demo app:

From the VS Code terminal:

cd /home/src/mongo-repl-test
git pull # optional
npm install # optional
npm start

Observe the output from the app:

npm start

> replica-set-tester@1.0.0 start
> node app.js

[2025-08-12T09:16:59.576Z] Current value: 1
[2025-08-12T09:17:00.068Z] Incremented
[2025-08-12T09:17:00.079Z] Current value: 2
[2025-08-12T09:17:00.576Z] Current value: 2
[2025-08-12T09:17:01.079Z] Current value: 2
[2025-08-12T09:17:01.074Z] Incremented
[2025-08-12T09:17:01.581Z] Current value: 3
[2025-08-12T09:17:02.083Z] Current value: 4
[2025-08-12T09:17:02.077Z] Incremented
...

Failover when NICELY killing primary process

Make the app output visible, and observe the incrementing count
From the terminal for the node that's currently PRIMARY:

root@mongo1:/# ps -ef | grep mongod
root        18     1  1 09:22 ?        00:01:57 mongod --config /etc/mongod.conf
root       317   152  0 11:22 pts/2    00:00:00 grep mongod

root@mongo1:/# kill 18

Observe that the output from the app wasn't interrupted
From any other node, run rsSummary():

rsSummary()

Click to expand example outuput

[
  {
    name: 'mongo0:27017',
    stateStr: 'PRIMARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo1:27017',
    stateStr: '(not reachable/healthy)',
    health: 0,
    priority: 1
  },
  {
    name: 'mongo2:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  }
]

Note that a new node has taken over as primary
Start mongod on that node again:

mongod --config /etc/mongod.conf&

Observe that the app output wasn't interrupted
Observe that the node has rejoined the cluster:

rsSummary()

Click to expand example outuput

[
  {
    name: 'mongo0:27017',
    stateStr: 'PRIMARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo1:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo2:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  }
]

Failover when HARD killing primary process

Make the app output visible, and observe the incrementing count
From the terminal for the node that's currently primary:

ps -ef | grep mongod

root        19    17  1 09:22 pts/1    00:02:14 mongod --config /etc/mongod.conf
root       348    17  0 11:35 pts/1    00:00:00 grep mongod

kill -9 19

Observe that the output from the app halts for a few seconds and then continues from where it left off, there are no errors reported by the application
From any node, run rsSummary():

rsSummary()

Click to expand example outuput

[
  {
    name: 'mongo0:27017',
    stateStr: '(not reachable/healthy)',
    health: 0,
    priority: 1
  },
  {
    name: 'mongo1:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo2:27017',
    stateStr: 'PRIMARY',
    health: 1,
    priority: 1
  }
]

Note that a new node has taken over as primary
Start mongod on that node again:

mongod --config /etc/mongod.conf&

Observe that the app output wasn't interrupted
Observe that the node has rejoined the cluster:

rsSummary()

Click to expand example outuput

[
  {
    name: 'mongo0:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo1:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo2:27017',
    stateStr: 'PRIMARY',
    health: 1,
    priority: 1
  }
]

Show original primary with higher priority is elected back to be primary after failing and restarting

Set mongo1 to have a higher priority than the other nodes (from mongosh), and also reduce the timeout for failover:

config = rs.conf()
config.members[1].priority = 10 // mongo1
config.settings.electionTimeoutMillis = 1000;  // Lower to 1 second
rs.reconfig(config)

Observe that mongo1 has been elected to primary and now has a higher priority than the other nodes:

rsSummary()

Click to expand example outuput

[
  {
    name: 'mongo0:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo1:27017',
    stateStr: 'PRIMARY',
    health: 1,
    priority: 10
  },
  {
    name: 'mongo2:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  }
]

kill -9 mongod on mongo1 and notice that the app doesn't pause for as long as before
Restart mongod on mongo1
From mongosh, confirm that mongo1 has rejoined the replica set and been reelected to primary

rsSummary()

Click to expand example outuput

[
  {
    name: 'mongo0:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo1:27017',
    stateStr: 'PRIMARY',
    health: 1,
    priority: 10
  },
  {
    name: 'mongo2:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  }
]

Set the timeout to 5 seconds:

config = rs.conf()
config.settings.electionTimeoutMillis = 5000;  // Increase to 5 seconds
rs.reconfig(config)

Change the connection string so that reads aren't delayed when primary fails

Stop the application (ctrl-c)
Edit app.js to include the primaryPreferred read preference:

const readCol = db.collection("counter", { readPreference: ReadPreference.primaryPreferred });

Restart the application (npm start)
kill -9 the primary mongod
Observe that the reads continue, but the counter is not incremented for a few seconds:

Click to expand example outuput

[2025-08-12T11:57:34.583Z] Current value: 2408
[2025-08-12T11:57:34.867Z] Incremented
[2025-08-12T11:57:35.083Z] Current value: 2409
[2025-08-12T11:57:35.587Z] Current value: 2409
[2025-08-12T11:57:36.095Z] Current value: 2409
[2025-08-12T11:57:36.601Z] Current value: 2409
[2025-08-12T11:57:37.104Z] Current value: 2409
[2025-08-12T11:57:37.604Z] Current value: 2409
[2025-08-12T11:57:38.107Z] Current value: 2409
[2025-08-12T11:57:38.610Z] Current value: 2409
[2025-08-12T11:57:39.115Z] Current value: 2409
[2025-08-12T11:57:39.615Z] Current value: 2409
[2025-08-12T11:57:40.116Z] Current value: 2409
[2025-08-12T11:57:40.619Z] Current value: 2409
[2025-08-12T11:57:35.869Z] Incremented
[2025-08-12T11:57:37.877Z] Incremented
[2025-08-12T11:57:36.872Z] Incremented
[2025-08-12T11:57:38.881Z] Incremented
[2025-08-12T11:57:39.882Z] Incremented
[2025-08-12T11:57:40.886Z] Incremented
[2025-08-12T11:57:41.120Z] Current value: 2415

Restart mongod on mongo1

Isolate the primary node from the network

mongo1 should still be the primary as it has the highest priority; isolate it from the Docker network:

docker network disconnect mongo-net mongo1

Confirm that mongo1 is not a functioning member of the replica set:

rsSummary()

Click to expand example outuput

[
  {
    name: 'mongo0:27017',
    stateStr: 'PRIMARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo1:27017',
    stateStr: '(not reachable/healthy)',
    health: 0,
    priority: 10
  },
  {
    name: 'mongo2:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  }
]

Try connecting mongosh to the replica set with only mongo1 in the connection string:

mongosh "mongodb://mongo1:27017/?authSource=admin&replicaSet=mongodb-repl-set"

mongosh "mongodb://mongo1:27017/?authSource=admin&replicaSet=mongodb-repl-set"

Current Mongosh Log ID:	689b30d9f67c0664e8d2950c
Connecting to:		mongodb://mongo1:27017/?authSource=admin&replicaSet=mongodb-repl-set&appName=mongosh+2.5.1
MongoServerSelectionError: getaddrinfo EAI_AGAIN mongo1

Check if the process has been stopped on mongo1:

ps -ef | grep mongod

root      1726  1636  0 11:21 pts/2    00:00:00 grep mongod

If the mongod process is still running, connect to mongo1 using mongosh and confirm that it rejects writes:

root@mongo1:/# mongosh

db.fluff.insertOne({})

MongoServerError[NotWritablePrimary]: not primary

Add mongo1 back to the network:

docker network connect mongo-net mongo1

Confirm that mongo1 is reelected to be primary

Kill (rather than gracefully stoping) the docker container

Kill the mongo1 container:

docker kill mongo1

Note from the app output that writes are paused during the failover/election
Restart the container
Restart mongod
Observe from mongosh that mongo1 rejoins the replica set and is reelected primary

Add an analytics node (if not using Atlas)

If not already running, start mongod on analytics
Add the node to the replica set (from mongosh):

rs.add({
  host: "analytics:27017",
  priority: 0,
  tags: { role: "analytics" }
});

Uncomment the analytics thread in app.js and restart the app:

const analyticsCol = db.collection("counter", {
  readPreference: { mode: "secondary", tags: [{ role: "analytics" }] } });

// Analytics thread
setInterval(async () => {
  try {
    const doc = await analyticsCol.findOne({ _id: "counter" });
    const now = new Date().toISOString();
    console.log(`ANALYTICS: [${now}] Current value: ${doc?.value}`);
  } catch (err) {
    console.error("Read error:", err.message);
  }
}, 5000);

Remove a node

Remove the analytics node and show that the application continues running:

rs.remove("analytics:27017");

(Optional) Save and publish the image based on one of these containers

docker commit app0 andrewmorgan818/mongodb-replication-demo
docker push andrewmorgan818/mongodb-replication-demo:latest

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
mongodb-cfg-files		mongodb-cfg-files
scripts		scripts
.gitignore		.gitignore
README.md		README.md
app.js		app.js
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
package-lock.json		package-lock.json
package.json		package.json

mongodb-developer/architect-day-failover-demo

Folders and files

Latest commit

History

Repository files navigation

MongoDB Failover and Effects

1. Prequisites

2. Docker Compose (Recommended)

2a. Setup Processes

2b. Run each demo

DEMO 1: Failover when NICELY killing primary process

DEMO 2: Failover when HARD killing primary process

DEMO 3: Higher Priority Node becomes primary

DEMO 4: Change the query options so that reads aren't delayed when the primary fails

DEMO 5: Isolate the primary node from the network

DEMO 6: Kill the docker container

DEMO 7: Add an analytics node (if not using Atlas)

DEMO 8: Remove a node

2. Docker (Alternative)

2a. Create a Docker network:

To be done first time or whenever there's a new version of the docker image

On-site, before the demo

Running the HA demo

Setting the scene

Failover when NICELY killing primary process

Failover when HARD killing primary process

Show original primary with higher priority is elected back to be primary after failing and restarting

Change the connection string so that reads aren't delayed when primary fails

Isolate the primary node from the network

Kill (rather than gracefully stoping) the docker container

Add an analytics node (if not using Atlas)

Remove a node

(Optional) Save and publish the image based on one of these containers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

DEMO 1: Failover when NICELY killing `primary` process

DEMO 2: Failover when HARD killing `primary` process

DEMO 3: Higher Priority Node becomes `primary`

DEMO 4: Change the query options so that reads aren't delayed when the `primary` fails

DEMO 5: Isolate the `primary` node from the network

Packages