Skip to content

mongodb-developer/architect-day-failover-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MongoDB Failover and Effects

These scenarios highlight the different ways a MongoDB replica set can be stressed or reconfigured to demonstrate its high-availability behavior. You can see how the cluster responds when the primary is killed abruptly, how election rules allow a higher-priority node to regain leadership, and how using the primaryPreferred read preference keeps queries flowing during failover. Network isolation and container termination tests simulate real outages, while adding an analytics node or removing a member shows how the replica set can adapt to changing workloads. Together, they provide a practical tour of resilience, election dynamics, and scaling options.

1. Prequisites

  • Install Docker Desktop. For MongoDB employees, request a Docker license from the Lumos app via corp.mongodb.com.

2. Docker Compose (Recommended)

2a. Setup Processes

In one terminal, start all up all the services and initiate the replica set:

# (terminal 1) Start all containers
$ docker compose up -d
[+] Running 5/5
 âś” Container mongo1     Healthy
 âś” Container mongo2     Healthy
 âś” Container analytics  Healthy
 âś” Container mongo0     Healthy
 âś” Container app0       Started

# (terminal 1) Initiate the replica set
$ docker compose exec mongo0 bash
$ mongosh --file /scripts/init-rs.js
Initiating replica set...
Replica set initiated successfully
$ mongosh --file /scripts/summary.js
exit

Open and review the application code in the app.js file. Explain the nature of the app. Ensure to cover the MongoClient, the read and write queries.

In another terminal, start the application process:

# (terminal 2) Start the app
$ docker compose exec app0 bash -c "npm start"
> replica-set-tester@1.0.0 start
> node app.js

[2025-08-27T14:51:23.515Z] Current value: 603
[2025-08-27T14:51:24.011Z] Incremented
[2025-08-27T14:51:24.018Z] Current value: 604
[2025-08-27T14:51:24.515Z] Current value: 604

Note, you'll come back and monitor the app terminal as you complete various demos.

2b. Run each demo

DEMO 1: Failover when NICELY killing primary process

Gracefully stopping the primary allows the replica set to detect the shutdown and quickly elect a new primary. Client applications continue with minimal or no interruption.

🎬🎬🎬 Click to expand the section and see the commands.
  • Kill the primary:

    # (terminal 1)
    KILL_NODE=mongo0 # update this variable with the *current* primary
    $ docker compose exec ${KILL_NODE} bash
    $ mongosh --file scripts/summary.js # optional, re-confirm this is the primary
    $ ps -ef  # optional
    $ pidof mongod # optional
    $ kill $(pidof mongod)
    $ ps -ef  # optional, check is dead
    $ exit
  • Observe the app wasn't interrupted in terminal 2.

  • Check the replica set from a running node:

    # (terminal 1)
    RUNNING_NODE=mongo1  # update this variable with a running node
    $ docker compose exec ${RUNNING_NODE} mongosh --file scripts/summary.js
  • Start the previously killed mongod:

    # (terminal 1)
    $ docker compose exec ${KILL_NODE} bash
    $ mongod --config /etc/mongod.conf --replSet mongodb-repl-set --fork
    
    # observe the node has rejoined the cluster
    $ mongosh --file scripts/summary.js
    $ exit

DEMO 2: Failover when HARD killing primary process

Using kill -9 abruptly terminates the primary without cleanup, causing a slightly longer election. Applications pause briefly for writes but resume once a new primary is chosen.

🎬🎬🎬 Click to expand the section and see the commands.
  • Kill the primary:

    # (terminal 1)
    KILL_NODE=mongo1 # update this variable with the *current* primary
    $ docker compose exec ${KILL_NODE} bash
    $ mongosh --file scripts/summary.js # optional, re-confirm this is the primary
    $ ps -ef  # optional
    $ pidof mongod # optional
    $ kill -9 $(pidof mongod)
    $ ps -ef  # optional, check is dead
    $ exit
  • In terminal 2, observe that the output from the app halts for a few seconds and then continues from where it left off, there are no errors reported by the application. Note, the reads were also paused. This will be addressed in an upcoming demo.

  • Check the replica set from a running node:

    # (terminal 1)
    RUNNING_NODE=mongo2  # update this variable with a running node
    $ docker compose exec ${RUNNING_NODE} mongosh --file scripts/summary.js
  • Start the previously killed mongod:

    # (terminal 1)
    $ docker compose exec ${KILL_NODE} bash
    $ mongod --config /etc/mongod.conf --replSet mongodb-repl-set --fork
    
    # observe the node has rejoined the cluster
    $ mongosh --file scripts/summary.js
    $ exit

DEMO 3: Higher Priority Node becomes primary

By setting replica set priorities, a designated node can reclaim the primary role after it restarts. This ensures leadership is assigned to preferred infrastructure.

🎬🎬🎬 Click to expand the section and see the commands.
  • Set mongo2 to have a higher priority. Note we also reduce electionTimeoutMillis in demos to quickly show a new primary being elected. In prod, a very low number can trigger false elections more often, which can reduce stability. It determines how long should a secondary node wait to trigger an election because it assumes the primary is unreachable.

    # (terminal 1)
    $ P10_NODE=mongo2
    $ docker compose exec ${P10_NODE} bash
    $ mongosh ${MONGODB_URI} --quiet
    $   config = rs.conf();
    $   config.members[2].host; // optional, confirm this is mongo2
    $   config.members[2].priority = 10;
    $   config.settings.electionTimeoutMillis = 1000;  // Lower to 1 second
    $   rs.reconfig(config);
    $   exit;
  • Observe that your P10_NODE has been elected to primary.

    # (terminal 1)
    $ docker compose exec ${P10_NODE} mongosh --file scripts/summary.js
  • Kill the primary:

    # (terminal 1)
    $ docker compose exec ${P10_NODE} pkill -9 mongod
  • In terminal 2, observe that the output from the app halts for a much shorter time compared the preceeding demo and then continues from where it left off, there are no errors reported by the application.

  • Check the replica set from a running node:

    # (terminal 1)
    RUNNING_NODE=mongo0  # update this variable with a running node
    $ docker compose exec ${RUNNING_NODE} mongosh --file scripts/summary.js
  • Start the previously killed mongod:

    # (terminal 1)
    $ docker compose exec ${P10_NODE} bash
    
    $ mongod --config /etc/mongod.conf --replSet mongodb-repl-set --fork
    
    
    # observe the node has rejoined the cluster as `primary`
  • Set electionTimeoutMillis to 5 seconds:

    $ mongosh ${MONGODB_URI} --quiet
    $  config = rs.conf();
    $  config.settings.electionTimeoutMillis = 5000;
    $  rs.reconfig(config);
    $  exit
    $ exit

DEMO 4: Change the query options so that reads aren't delayed when the primary fails

In this demo we'll use a primaryPreferred Read Preference. With primaryPreferred, reads can fall back to secondaries during primary downtime. This keeps queries flowing even if writes are briefly blocked.

Note: The primaryPreferred option can be set at the connection string/driver level or on a per-query basis.

🎬🎬🎬 Click to expand the section and see the commands.
  • In terminal 2, stop the application (ctrl-c)

  • Edit app.js to include the primaryPreferred read preference:

    const col = db.collection("counter", { readPreference: ReadPreference.primaryPreferred });
  • Restart the application docker compose exec app0 bash -c "npm start"

  • Kill the primary docker compose exec ${P10_NODE} pkill -9 mongod

  • Observe that the reads continue, but the counter is not incremented for a few seconds.

  • Verify P10_NODE is unreachable.

     $ RUNNING_NODE=mongo0
     $ docker compose exec ${RUNNING_NODE} mongosh --file scripts/summary.js
  • Restart mongod docker compose exec ${P10_NODE} mongod --config /etc/mongod.conf --fork

DEMO 5: Isolate the primary node from the network

Network isolation simulates a partition, triggering the remaining members to elect a new primary. Once reconnected, the isolated node rejoins as a secondary.

🎬🎬🎬 Click to expand the section and see the commands.
  • P10_NODE should still be the primary as it has the highest priority; isolate it from the Docker network

    docker network disconnect mongo-net ${P10_NODE}

    You may spot a Increment error: connect ECONNREFUSED 127.0.0.1:27017 Error on the app, that's expected.

  • Confirm P10_NODE is not a functioning member of the replica set. It may take up to electionTimeoutMillis to see this change.

    $ RUNNING_NODE=mongo0 # update accordingly
    $ docker compose exec ${RUNNING_NODE} mongosh --file scripts/summary.js
    $ docker compose exec ${P10_NODE} mongosh --file scripts/summary.js
  • Confirm writes are rejected on P10_NODE

    $ docker compose exec ${P10_NODE} mongosh --eval "db.fluff.insertOne({})"
    MongoServerError: not primary
  • Add P10_NODE back to the network docker network connect mongo-net ${P10_NODE}

  • Confirm P10_NODE is reelected to be primary

    docker compose exec ${P10_NODE} mongosh --file scripts/summary.js

DEMO 6: Kill the docker container

Force-killing the primary MongoDB container simulates a crash, causing failover and re-election. The application may see a short write pause before resuming.

🎬🎬🎬 Click to expand the section and see the commands.
  • Kill the P10_NODE container docker compose kill ${P10_NODE}

  • Note from the app output that writes are paused during the failover/election.

  • Restart the container docker compose restart ${P10_NODE}

  • Restart mongod docker compose exec ${P10_NODE} mongod --config /etc/mongod.conf --fork

  • Observe that P10_NODE rejoins the replica set && is reelected primary

    $ docker compose exec ${P10_NODE} mongosh --file scripts/summary.js

DEMO 7: Add an analytics node (if not using Atlas)

Adding an analytics node with priority 0 and role tags routes reporting or BI workloads to it, offloading queries from the main replica set. This improves performance for operational traffic.

🎬🎬🎬 Click to expand the section and see the commands.
  • Add the analytics node to the replica set

    $ docker compose exec ${P10_NODE} mongosh ${MONGODB_URI} --quiet
      rs.add({
        host: "analytics:27017",
        priority: 0, // can never be primary
        tags: { role: "analytics" }
      });
    $ exit
  • Uncomment the analytics thread code (DEMO 7) in app.js

  • Restart the app

DEMO 8: Remove a node

Removing a member reduces fault tolerance but keeps the replica set functional if a majority remains. It’s useful for scaling down or node maintenance.

🎬🎬🎬 Click to expand the section and see the commands. \
$ docker compose exec mongo0
$ mongosh ${MONGODB_URI}
  rs.remove("analytics:27017");
$ exit

2. Docker (Alternative)

2a. Create a Docker network:

docker network create mongo-net

To be done first time or whenever there's a new version of the docker image

  1. Delete any existing containers and images for this demo
  2. Fetch the latest Docker image:
docker pull andrewmorgan818/mongodb-replication-demo:latest
  1. Create the containers and connect them to our Docker network
docker run -dit \
  --name mongo0 \
  --hostname mongo0 \
  --network mongo-net \
  andrewmorgan818/mongodb-replication-demo bash
docker run -dit \
  --name mongo1 \
  --hostname mongo1 \
  --network mongo-net \
  andrewmorgan818/mongodb-replication-demo bash
docker run -dit \
  --name mongo2 \
  --hostname mongo2 \
  --network mongo-net \
  andrewmorgan818/mongodb-replication-demo bash
docker run -dit \
  --name analytics \
  --hostname analytics \
  --network mongo-net \
  andrewmorgan818/mongodb-replication-demo bash
docker run -dit \
  --name app0 \
  --hostname app0 \
  --network mongo-net andrewmorgan818/mongodb-replication-demo bash

On-site, before the demo

  1. Start the containers from Docker Desktop
  2. Connect a seperate terminal tab to each of the nodes:
docker exec -it mongo0 bash
docker exec -it mongo1 bash
docker exec -it mongo2 bash
docker exec -it analytics bash
docker exec -it app0 bash
  1. Start the mongod process on mongo0, mongo1, mongo2, and analytics:
mongod --config /etc/mongod.conf&
  1. Connect VS Code to app0:
  • Execute (command-ctrl-p) Dev Containers: Attach to Running Container:

Dev Containers

  • Connect to app0:

app0

Running the HA demo

Setting the scene

  1. On mongo0, initiate the replica set:
mongosh
rs.initiate(
  {
     _id: "mongodb-repl-set",
     version: 1,
     members: [
        { _id: 0, host : "mongo0" },
        { _id: 1, host : "mongo1" },
        { _id: 2, host : "mongo2" }
     ]
  }
)

quit
  1. Confirm that the replica set is up and running:
mongosh "mongodb://mongo0:27017,mongo1:27017,mongo2:27017/?authSource=admin&replicaSet=mongodb-repl-set"
function rsSummary() {
  const config = rs.config();
  return rs.status().members.map((m, i) => ({
    name: m.name,
    stateStr: m.stateStr,
    health: m.health,
    priority: config.members[i].priority
  }));
}

rsSummary()
  1. Show the demo app code in /home/src/mongo-repl-test/app.js
  2. Run the demo app:
  • From the VS Code terminal:
cd /home/src/mongo-repl-test
git pull # optional
npm install # optional
npm start
  1. Observe the output from the app:
npm start
> replica-set-tester@1.0.0 start
> node app.js

[2025-08-12T09:16:59.576Z] Current value: 1
[2025-08-12T09:17:00.068Z] Incremented
[2025-08-12T09:17:00.079Z] Current value: 2
[2025-08-12T09:17:00.576Z] Current value: 2
[2025-08-12T09:17:01.079Z] Current value: 2
[2025-08-12T09:17:01.074Z] Incremented
[2025-08-12T09:17:01.581Z] Current value: 3
[2025-08-12T09:17:02.083Z] Current value: 4
[2025-08-12T09:17:02.077Z] Incremented
...

Failover when NICELY killing primary process

  1. Make the app output visible, and observe the incrementing count
  2. From the terminal for the node that's currently PRIMARY:
root@mongo1:/# ps -ef | grep mongod
root        18     1  1 09:22 ?        00:01:57 mongod --config /etc/mongod.conf
root       317   152  0 11:22 pts/2    00:00:00 grep mongod
root@mongo1:/# kill 18
  1. Observe that the output from the app wasn't interrupted
  2. From any other node, run rsSummary():
rsSummary()
Click to expand example outuput
[
  {
    name: 'mongo0:27017',
    stateStr: 'PRIMARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo1:27017',
    stateStr: '(not reachable/healthy)',
    health: 0,
    priority: 1
  },
  {
    name: 'mongo2:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  }
]
  1. Note that a new node has taken over as primary
  2. Start mongod on that node again:
mongod --config /etc/mongod.conf&
  1. Observe that the app output wasn't interrupted
  2. Observe that the node has rejoined the cluster:
rsSummary()
Click to expand example outuput
[
  {
    name: 'mongo0:27017',
    stateStr: 'PRIMARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo1:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo2:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  }
]

Failover when HARD killing primary process

  1. Make the app output visible, and observe the incrementing count
  2. From the terminal for the node that's currently primary:
ps -ef | grep mongod
root        19    17  1 09:22 pts/1    00:02:14 mongod --config /etc/mongod.conf
root       348    17  0 11:35 pts/1    00:00:00 grep mongod
kill -9 19
  1. Observe that the output from the app halts for a few seconds and then continues from where it left off, there are no errors reported by the application
  2. From any node, run rsSummary():
rsSummary()
Click to expand example outuput
[
  {
    name: 'mongo0:27017',
    stateStr: '(not reachable/healthy)',
    health: 0,
    priority: 1
  },
  {
    name: 'mongo1:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo2:27017',
    stateStr: 'PRIMARY',
    health: 1,
    priority: 1
  }
]
  1. Note that a new node has taken over as primary
  2. Start mongod on that node again:
mongod --config /etc/mongod.conf&
  1. Observe that the app output wasn't interrupted
  2. Observe that the node has rejoined the cluster:
rsSummary()
Click to expand example outuput
[
  {
    name: 'mongo0:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo1:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo2:27017',
    stateStr: 'PRIMARY',
    health: 1,
    priority: 1
  }
]

Show original primary with higher priority is elected back to be primary after failing and restarting

  1. Set mongo1 to have a higher priority than the other nodes (from mongosh), and also reduce the timeout for failover:
config = rs.conf()
config.members[1].priority = 10 // mongo1
config.settings.electionTimeoutMillis = 1000;  // Lower to 1 second
rs.reconfig(config)
  1. Observe that mongo1 has been elected to primary and now has a higher priority than the other nodes:
rsSummary()
Click to expand example outuput
[
  {
    name: 'mongo0:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo1:27017',
    stateStr: 'PRIMARY',
    health: 1,
    priority: 10
  },
  {
    name: 'mongo2:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  }
]
  1. kill -9 mongod on mongo1 and notice that the app doesn't pause for as long as before
  2. Restart mongod on mongo1
  3. From mongosh, confirm that mongo1 has rejoined the replica set and been reelected to primary
rsSummary()
Click to expand example outuput
[
  {
    name: 'mongo0:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo1:27017',
    stateStr: 'PRIMARY',
    health: 1,
    priority: 10
  },
  {
    name: 'mongo2:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  }
]
  1. Set the timeout to 5 seconds:
config = rs.conf()
config.settings.electionTimeoutMillis = 5000;  // Increase to 5 seconds
rs.reconfig(config)

Change the connection string so that reads aren't delayed when primary fails

  1. Stop the application (ctrl-c)
  2. Edit app.js to include the primaryPreferred read preference:
const readCol = db.collection("counter", { readPreference: ReadPreference.primaryPreferred });
  1. Restart the application (npm start)
  2. kill -9 the primary mongod
  3. Observe that the reads continue, but the counter is not incremented for a few seconds:
Click to expand example outuput
[2025-08-12T11:57:34.583Z] Current value: 2408
[2025-08-12T11:57:34.867Z] Incremented
[2025-08-12T11:57:35.083Z] Current value: 2409
[2025-08-12T11:57:35.587Z] Current value: 2409
[2025-08-12T11:57:36.095Z] Current value: 2409
[2025-08-12T11:57:36.601Z] Current value: 2409
[2025-08-12T11:57:37.104Z] Current value: 2409
[2025-08-12T11:57:37.604Z] Current value: 2409
[2025-08-12T11:57:38.107Z] Current value: 2409
[2025-08-12T11:57:38.610Z] Current value: 2409
[2025-08-12T11:57:39.115Z] Current value: 2409
[2025-08-12T11:57:39.615Z] Current value: 2409
[2025-08-12T11:57:40.116Z] Current value: 2409
[2025-08-12T11:57:40.619Z] Current value: 2409
[2025-08-12T11:57:35.869Z] Incremented
[2025-08-12T11:57:37.877Z] Incremented
[2025-08-12T11:57:36.872Z] Incremented
[2025-08-12T11:57:38.881Z] Incremented
[2025-08-12T11:57:39.882Z] Incremented
[2025-08-12T11:57:40.886Z] Incremented
[2025-08-12T11:57:41.120Z] Current value: 2415
  1. Restart mongod on mongo1

Isolate the primary node from the network

  1. mongo1 should still be the primary as it has the highest priority; isolate it from the Docker network:
docker network disconnect mongo-net mongo1
  1. Confirm that mongo1 is not a functioning member of the replica set:
rsSummary()
Click to expand example outuput
[
  {
    name: 'mongo0:27017',
    stateStr: 'PRIMARY',
    health: 1,
    priority: 1
  },
  {
    name: 'mongo1:27017',
    stateStr: '(not reachable/healthy)',
    health: 0,
    priority: 10
  },
  {
    name: 'mongo2:27017',
    stateStr: 'SECONDARY',
    health: 1,
    priority: 1
  }
]
  1. Try connecting mongosh to the replica set with only mongo1 in the connection string:
mongosh "mongodb://mongo1:27017/?authSource=admin&replicaSet=mongodb-repl-set"
mongosh "mongodb://mongo1:27017/?authSource=admin&replicaSet=mongodb-repl-set"
Current Mongosh Log ID:	689b30d9f67c0664e8d2950c
Connecting to:		mongodb://mongo1:27017/?authSource=admin&replicaSet=mongodb-repl-set&appName=mongosh+2.5.1
MongoServerSelectionError: getaddrinfo EAI_AGAIN mongo1
  1. Check if the process has been stopped on mongo1:
ps -ef | grep mongod
root      1726  1636  0 11:21 pts/2    00:00:00 grep mongod
  1. If the mongod process is still running, connect to mongo1 using mongosh and confirm that it rejects writes:
root@mongo1:/# mongosh
db.fluff.insertOne({})
MongoServerError[NotWritablePrimary]: not primary
  1. Add mongo1 back to the network:
docker network connect mongo-net mongo1
  1. Confirm that mongo1 is reelected to be primary

Kill (rather than gracefully stoping) the docker container

  1. Kill the mongo1 container:
docker kill mongo1
  1. Note from the app output that writes are paused during the failover/election
  2. Restart the container
  3. Restart mongod
  4. Observe from mongosh that mongo1 rejoins the replica set and is reelected primary

Add an analytics node (if not using Atlas)

  1. If not already running, start mongod on analytics
  2. Add the node to the replica set (from mongosh):
rs.add({
  host: "analytics:27017",
  priority: 0,
  tags: { role: "analytics" }
});
  1. Uncomment the analytics thread in app.js and restart the app:
const analyticsCol = db.collection("counter", {
  readPreference: { mode: "secondary", tags: [{ role: "analytics" }] } });

// Analytics thread
setInterval(async () => {
  try {
    const doc = await analyticsCol.findOne({ _id: "counter" });
    const now = new Date().toISOString();
    console.log(`ANALYTICS: [${now}] Current value: ${doc?.value}`);
  } catch (err) {
    console.error("Read error:", err.message);
  }
}, 5000);

Remove a node

  1. Remove the analytics node and show that the application continues running:
rs.remove("analytics:27017");

(Optional) Save and publish the image based on one of these containers

docker commit app0 andrewmorgan818/mongodb-replication-demo
docker push andrewmorgan818/mongodb-replication-demo:latest

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •