When starting to work with a new technology in a development or sandbox environment, we tend to run things – as much as possible – using their default settings. This is completely understandable, as we’re not familiar with what all of the options are. With MongoDB, there are some important issues you might face if you leave your setup in the default configuration. Here are some of our experiences getting started trying out MongoDB on virtual machines both locally and on some Amazon EC2 instances.
Ensure that the version of MongoDB that you’re using has the features you’re going to use. For example, if you installed the “mongodb” package from the default debian repositories as of this writing (v 1.4.4) , you’d find that some of the commands used for sharding are available, but sharding isn’t actually supported in that version, so this incompatibility is masked. Ensure you get the latest (as of this writing, it’s 2.0.2) by adding downloads-distro.mongodb.org/repo/ubuntu-upstart to your repos and installing the package named mongodb-10gen.
Check that your OS version is compatible with what you’re going to use it for. MongoDB recommends a 64-bit OS due to file-size limitations in 32-bit OS’s.
Why is it taking so long to start up?
- Check to see if journaling is turned on.
If you’re on a 64-bit install, version 1.9.2+, journaling is enabled by default. MongoDB may decide it needs to preallocate the journal files, and it waits until the files are allocated before it starts listening on the default port. This could be on the order of tens of minutes. You definitely don’t need this option if you’re just kicking the tires and trying to see what mongodb can do. Restart with –nojournal.
- What type of filesystem is your database directory mounted on?
A lot of the popular Linux AMIs available in Amazon’s list have the ext3 filesystem mounted by default. It’s recommended to use ext4 or xfs filesystems for your database directory, due to the file allocation speed. This is especially noticeable if MongoDB starts allocating journal files, as in the above. If you’re using an AWS instance, you’ll avoid this problem if you set up a RAID10 filesystem for your data directory, as shown here.
Another issue is disk space. If you leave settings to their default and you’re on a VM or a machine with limited disk space, you could very well start hitting your limits soon. Even if you are starting up a configserver, it will end up taking up another 3GB if you’re not careful. Our recommendation is to use the –smallfiles flag as you’re starting, stopping, and configuring, until you figure out what you’re doing. As an example, we followed this page to create a sharded database on a debian VM with about 16GB of disk space, and it quickly ballooned to this:
[email protected]:~/mongodb$ du -h .
Bottom line: use “–smallfiles” in your command line flags or in your /etc/mongodb.conf files until you are actually running in an environment that has the required disk space.
Splitting and Balancing
As Jeremy Zawodny outlines in his excellent blog post “MongoDB Pre-Splitting for Faster Data Loading and Importing”, it is important to understand how MongoDB manages shards. By default, documents are grouped into 200MB chunks which are mapped to shards, and then moves those chunks between shards as the balancer attempts to manage the load. If you’re doing a large data migration, however, this can be tricky. Check Jeremy’s post for some great advice on pre-splitting while maintaining acceptable performance levels.
Spaces in Configuration Files
It’s not in the documentation anywhere, but another gotcha was in a configuration file – for example a line like this, with multiple values for a single parameter:
configdb = ip-10-172-31-109.us-west-1.compute.internal,ip-10-170-209-104.us-west-1.compute.internal,ip-10-172-169-38.us-west-1.compute.internal
If you have spaces before or after the “,”, the setting will not parse. Just ensure it’s a single string with no spaces as it is above.