Exploring the operations log in MongoDB
Recently we had a client that was having network issues on their production MongoDB deployment. They experienced a network partition on their database which is configured as 3 node replica set running on 2.6 version. The issues were resolved after six hours. After resolving the network issue, the node that was out of the replica set become stale and had to be resynced. The reason behind this was that the replication window was configured with only under three hours replication events. The question was: why is the replication window so short?
What is the MongoDB Replication Window?
If you are from the MySQL world, you are probably already familiar with master-slave replication in its basics. There are binary logs on the master used for writing the replication events. The same logs are copied on all slaves connected to the master as relay logs using Slave_IO thread . The slave executes the events from the relay logs using Slave_SQL thread . The replication is done asynchronously. The slave can stop replicating any time and will continue from the last position it stopped. There is expire_logs_days variable on the master that can be changed dynamically if we want to keep more binary logs to extend the replication window. The slave server does not need binary logs enabled for the replication to work. Similar to binary logs, in MongoDB there is the operations log, called oplog, that is used for replication. If you check in the MongoDB docs, you will find that the oplog is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases. MongoDB applies database operations on the primary and then records the operations on the primary’s oplog. The secondary members then copy and apply these operations in an asynchronous process. All replica set members contain a copy of the oplog, in the local.oplog.rs collection, which allows them to maintain the current state of the database. This is different than MySQL replication, and the oplog collection must exist on all members from the replica set. Additionally, there is no option for adding filters in the replication: all nodes from the replica set must have the same dataset. Any member can import oplog entries from any other member. It’s a great feature in MongoDB for automatic failover where any member can become Primary (priority 0 nodes and hidden nodes can not become Primary). How quickly the capped oplog collection fills up with changes is the replication window. The oplog collection is stored in the local database and its initial size is calculated as shown: By default, the size of the oplog is as follows:- For 64-bit Linux, Solaris, FreeBSD, and Windows systems, MongoDB allocates 5% of the available free disk space, but will always allocate at least 1 gigabyte and never more than 50 gigabytes.
- For 64-bit OS X systems, MongoDB allocates 183 megabytes of space to the oplog.
- For 32-bit systems, MongoDB allocates about 48 megabytes of space to the oplog.