How to backup MongoDB database using lvm snapshots – Part 1

Posted in: MongoDB, Technical Track

In this series of blog posts for MongoDB replica sets, I will show you how to properly run full backups using lvm snapshot followed by incremental backups using the oplog. I will also cover restores with point in time recovery using the previous backup. Snapshots work by creating pointers between the live data and a special snapshot volume. These pointers are theoretically equivalent to “hard links.” As the working data diverges from the snapshot, the snapshot process uses a copy-on-write strategy. As a result, the snapshot only grows as data is modified. 

What is LVM

Logical volume manager – lvm is a program that abstracts disk images from physical devices. It provides a number of raw disk manipulation and snapshot capabilities, useful for system management. A logical volume snapshot is a copy-on-write technology that monitors changes to an existing volume’s data blocks. When a write is made to one of the blocks, the block’s value at the snapshot time is copied to a snapshot volume. 

To be able to utilize lvm snapshot, your server must be using logical volume management, especially the partition where you mount your MongoDB data directory. Let’s see this with actual example:

I’m not going into details on how to configure lvm and how to create your volumes. It’s something that is already very well covered in different blog posts. I’ll quickly share sample output of pvdisplay, vgdisplay, lvdisplay from my test system. We will use the output in our backup commands later. Additionally try to separate your MongoDB directories from the rest of your file system so the backup will only have MongoDB related data.

 

[email protected]:~# pvdisplay

  --- Physical volume ---

  PV Name               /dev/sdc1

  VG Name               vgdiskdata

  PV Size               <50.00 GiB / not usable 3.00 MiB

  Allocatable           yes

  PE Size               4.00 MiB

  Total PE              12799

  Free PE               7679

  Allocated PE          5120

  PV UUID               d1xGEN-mwW0-edRe-l1dF-SAjp-qmMA-1RdE9a

The above output from pvdisplay shows that there is a physical volume /dev/sdc1 and it’s used in a volume group vgdiskdata. Please note in this case we only have a single physical volume, but there could be more in your case.

 

[email protected]:~# vgdisplay

  --- Volume group ---

  VG Name               vgdiskdata

  System ID

  Format                lvm2

  Metadata Areas        1

  Metadata Sequence No  341

  VG Access             read/write

  VG Status             resizable

  MAX LV                0

  Cur LV                1

  Open LV               1

  Max PV                0

  Cur PV                1

  Act PV                1

  VG Size               <50.00 GiB

  PE Size               4.00 MiB

  Total PE              12799

  Alloc PE / Size       5120 / 20.00 GiB

  Free  PE / Size       7679 / <30.00 GiB

  VG UUID               d99fRZ-Naup-ixHj-GRYl-XIBp-tJgv-T958AO

The output of vgdisplay as shown above, has the volume group name, its format, size and other parameters like Free. From a total 50GiB we have <30GiB free. We can extend this volume group size if we add more physical volumes to it.

 

[email protected]:~# lvdisplay

  --- Logical volume ---

  LV Path                /dev/vgdiskdata/lvmongo

  LV Name                lvmongo

  VG Name                vgdiskdata

  LV UUID                kJ2e5W-DVWM-rXwC-xEwn-H5kP-Ws0w-TtuI2L

  LV Write Access        read/write

  LV Creation host, time mongodbhc, 2022-09-27 10:59:30 +0000

  LV Status              available

  # open                 1

  LV Size                20.00 GiB

  Current LE             5120

  Segments               1

  Allocation             inherit

  Read ahead sectors     auto

  - currently set to     256

  Block device           253:1




[email protected]:~#

Finally, lvdisplay output shows the logical volume path, name and to which volume group it belongs to. See the name matches with our volume group vgdiskdata as that’s the volume group in which it’s created. Each time we create a snapshot from this logical volume lvmongo, it will be created in its volume group vgdiskdata. This is where the size of the volume group is important.

Now, if we want to see where this logical volume lvmongo is mounted to, we can run the command lsblk or use df. In my case, the MongoDB data directory is set as /mnt/mongo, so that is the file system path that I want to take backups on.

 

[email protected]:~# lsblk /dev/sdc1

NAME                 MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sdc1                   8:33   0  50G  0 part

??vgdiskdata-lvmongo 253:1    0  20G  0 lvm  /mnt/mongo

[email protected]:~# df -hTP /mnt/mongo/

Filesystem                     Type  Size  Used Avail Use% Mounted on

/dev/mapper/vgdiskdata-lvmongo xfs    20G  2.2G   18G  11% /mnt/mongo
Taking the snapshot

From the above output we can see the logical volume is mounted to /mnt/mongo mountpoint and its type lvm. So this is the partition on the file system that belongs to the logical volume /dev/vgdiskdata/lvmongo

Now that we have all of the information for file system partitions and logical volumes on our system, let’s create a lvm snapshot backup. We can restore our database in case of a failure scenario using this backup. 

Before we create the snapshot, let’s use some best practices from MongoDB and lock the database for writes. Even though this is not required for lvm snapshot, taking this step ensures success of the backup and eventual restore by guaranteeing that writes do not happen during the snapshot process. As soon as we create the snapshot and mount as read-only on the file system, we will unlock the database. This will be also useful for us to get the oplog position as we want to take incremental backups after each full backup.

Typically we run backups on MongoDB Secondary nodes, so this should have zero impact. In particular, you could have a hidden node or delayed Secondary as a dedicated backup node in your replica set. 

If you are running the command fsyncLock() in a mongo shell, it will look like this:

mongohc:SECONDARY> db.fsyncLock()

{

"info" : "now locked against writes, use db.fsyncUnlock() to unlock",

"lockCount" : NumberLong(1),

"seeAlso" : "http://dochub.mongodb.org/core/fsynccommand",

"ok" : 1,

"$clusterTime" : {

"clusterTime" : Timestamp(1666351737, 1),

"signature" : {

"hash" : BinData(0,"5cqLjc3Rr7I9/Y+8D+dqeGrUBCY="),

"keyId" : NumberLong("7148394471967686661")

}

},

"operationTime" : Timestamp(1666351737, 1)

}

mongohc:SECONDARY>

 

You will most likely be using a script to perform this sequence of steps. This could be scripted as fsynclock.js file and then called with mongo client like this:

mongo -u<username> -p<password> --port <port> --quiet fsynclock.js

 

One additional note I’d like to highlight here for incremental backups using the oplog. If you want to take oplog backups between each full daily backup, you should take the oplog position now, when the database is locked and before you create the snapshot. This is something that you can perform using a script like below:

[email protected]:~# cat backup_oplog_ts.js

var local = db.getSiblingDB('local');

var last = local['oplog.rs'].find().sort({'$natural': -1}).limit(1)[0];

var result = {};

if(last != null) {

    result = {position : last['ts']};

}

print(JSON.stringify(result));

[email protected]:~# mongo -u<username> -p<password> --port <port> --quiet fsynclock.js > oplog_position

 

The oplog_position file will have information like below:

{"position":{"$timestamp":{"t":1666355398,"i":1}}}

 

We have full backup up to this point of time, for our incremental backups we will use this timestamp as a starting point. How to run the incremental backups will be part two of this blog series.

Our next command will create a snapshot using lvmongo as a source volume.

[email protected]:~# lvcreate -L500M -s -n mongosnap_21oct2022 /dev/vgdiskdata/lvmongo

  Logical volume "mongosnap_21oct2022" created.

[email protected]:~#

Few notes on the above command:

-L  specifies the size of the snapshot. We need to ensure there is enough space in the volume group where the original volume is created for the extra space we specify here. Even if we go above the available size, it will only use what is remaining in the volume group. This is important if you plan to keep your snapshot for a longer time on a system with heavy writes.

[email protected]:~# lvcreate -L40G -s -n largesnap /dev/vgdiskdata/lvmongo

  Reducing COW size 40.00 GiB down to maximum usable size 20.08 GiB.

  Logical volume "largesnap" created.

[email protected]:~#

-n specifies the snapshot name, please note the name like snapshot is reserved and cannot be used.

-s specifies it’s a snapshot

 

Let’s check how the lvdisplay output looks now

[email protected]:~# lvdisplay

  --- Logical volume ---

  LV Path                /dev/vgdiskdata/lvmongo

  LV Name                lvmongo

  VG Name                vgdiskdata

  LV UUID                kJ2e5W-DVWM-rXwC-xEwn-H5kP-Ws0w-TtuI2L

  LV Write Access        read/write

  LV Creation host, time mongodbhc, 2022-09-27 10:59:30 +0000

  LV snapshot status     source of

                         mongosnap_21oct2022 [active]

  LV Status              available

  # open                 1

  LV Size                20.00 GiB

  Current LE             5120

  Segments               1

  Allocation             inherit

  Read ahead sectors     auto

  - currently set to     256

  Block device           253:1




  --- Logical volume ---

  LV Path                /dev/vgdiskdata/mongosnap_21oct2022

  LV Name                mongosnap_21oct2022

  VG Name                vgdiskdata

  LV UUID                fv7BHZ-JClM-axTQ-AJUU-XTkP-DezD-c8e8Q2

  LV Write Access        read/write

  LV Creation host, time mongodbhc, 2022-10-21 10:57:57 +0000

  LV snapshot status     active destination for lvmongo

  LV Status              available

  # open                 0

  LV Size                20.00 GiB

  Current LE             5120

  COW-table size         500.00 MiB

  COW-table LE           125

  Allocated to snapshot  0.31%

  Snapshot chunk size    4.00 KiB

  Segments               1

  Allocation             inherit

  Read ahead sectors     auto

  - currently set to     256

  Block device           253:3




[email protected]:~#

As we can see, there is another logical volume with the name: mongosnap_21oct2022 and part of the same volume group vgdiskdata. For lvmongo we can see details about “Source of mongosnap_21oct2022, while for mongosnap_21oct2022, we can see “Active destination for lvmongo”

Additionally we can use the command lvs that has the output below. There are two logical volumes, belonging to the same volume group, but for the snapshot we can see its origin is lvmongo.

[email protected]:~# lvs

  LV                  VG         Attr       LSize   Pool Origin  Data%  Meta%  Move Log Cpy%Sync Convert

  lvmongo             vgdiskdata owi-aos---  20.00g

  mongosnap_21oct2022 vgdiskdata swi-a-s--- 500.00m      lvmongo 3.25

 

The new snapshot can be used as logical volume and mounted on the file system. From there we can copy the files to offsite locations. Using this approach, we can restore the exact same copy of the database at the time we took the backup on any system. 

The steps will include:

Mount the snapshot to tmp mount as a read only file system. This is important because with lvm snapshot, the data is changing. The snapshot is capturing the changes and that will be reflecting on the file system. If there are writes while we backup and copy the archive elsewhere, the database will be in a corrupted state when we try to restore.

mkdir /tmp/mongosnap

mount -t xfs -o nouuid,ro /dev/vgdiskdata/mongosnap_21oct2022 /tmp/mongosnap/

 

Before we go further, the snapshot is taken, mounted as read only on the file system. Let’s unlock the database for writes and let the replication resume. Between the lock and unlock it should be a few seconds time, so this should not be any impactful on replication lag.

mongohc:SECONDARY> db.fsyncUnlock()

{

"info" : "fsyncUnlock completed",

"lockCount" : NumberLong(0),

"ok" : 1,

"$clusterTime" : {

"clusterTime" : Timestamp(1666351827, 1),

"signature" : {

"hash" : BinData(0,"/ZbNlG1binKXSO9f4trXCc2LdsE="),

"keyId" : NumberLong("7148394471967686661")

}

},

"operationTime" : Timestamp(1666351737, 1)

}

mongohc:SECONDARY>

 

Or if you want to call this from mongo client, place the command in a fsyncunlock.js file and execute 

mongo -u<username> -p<password> --port <port> --quiet fsyncunlock.js

 

Tar the directory with the designated name and move the tar backup file to backups location. This can be an NFS mount point, separate disk on the system, or even copied to a cloud bucket. 

tar -czf mongodb_backup_$(date '+%Y%m%d%H%M').tar.gz -C /tmp/mongosnap/ .

mv mongodb_backup_$(date '+%Y%m%d%H%M').tar.gz /backups/

Or

tar -czf /backups/mongodb_backup_$(date '+%Y%m%d%H%M').tar.gz --absolute-names /tmp/mongosnap

 

Finally, unmount the partition and remove the snapshot. 

umount /tmp/mongosnap

lvremove /dev/vgdiskdata/mongosnap_21oct2022

If you repeat this daily, you can have daily full backup and between each full backup take incremental backups using the oplog. 

NEXT:

INCREMENTAL BACKUPS USING THE OPLOG is covered in  Part 2 of this series.

RESTORES WITH POINT IN TIME RECOVERY is covered in Part 3 of this series.

 

Conclusion

Running LVM provides additional flexibility and enables the possibility of using snapshots to back up MongoDB. Using daily full backup followed by incremental oplog backup will allow you to do PITR. You can restore your database just before you execute an erroneous operation and recover your data with confidence.

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Igor is MongoDB Certified DBA supporting the next-generation of database solutions in both MySQL and MongoDB. With a masters degree in Software Engineering, Igor enjoys the variety of challenges he faces while working at Pythian, rather than working on the same thing every day. When he's not working, he can be found running or hiking.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *