As part of PalominoDB’s “Mongo Month”, we’re reviewing use cases for where we’d recommend choosing MongoDB and where we’d consider other technologies.
Because every environment and architecture is different, and every business has unique needs, we work carefully with clients to find the best solution for the particular challenges they face. Sometimes MongoDB or one of the other open-source technologies we build and support is most appropriate; sometimes an RDBMS is most appropriate; and often a hybrid environment makes the most sense for a client’s specific needs.
Our partners at 10gen lay out the more typical use cases, and we find there are often additional factors to consider, such as:
- Risk tolerance for bugs and unmapped behaviors
- Availability requirements
- Access- and location-based requirements
- Security requirements
- Existing skill sets and tooling
- Existing architecture and infrastructure
- Growth expectations and the timeline therein
- Support? Community? Start-up? Enterprise Class?
Below, we’ll review some of the specific use cases our clients face, and we’ll explore how MongoDB and/or other technologies might address these most appropriately.
Archiving:
Craigslist is one of the most famous implementations of MongoDB for archiving – in this case, old posts. The schemaless component makes it easy for the datastore to grow as the data model evolves. Because MongoDB does not offer some features, such as compression, that other tools like Cassandra offer natively, Craigslist has created patterns for workarounds to address issues such as presplitting data to avoid chunks migrating to various shards or using ZFS with LZO compression.
MongoDB and Cassandra both suit this use case well. The choice in a situation like this is often determined by in-house skillsets and preferences, as well as the actual size and amount of data that needs to be managed (stay tuned for a future post about the complexities of data management at various stages of data volume and scale). Additional considerations for Cassandra (and HBase or any other Java-based DBMS) include JVM management and tuning, which can be quite challenging for operations teams unused to working with this issue.
Other solutions:
- Cassandra
pros: Native compression reduces complexity and growth is easier to manage with new nodes. Cassandra is arguably the better choice for massive scale (both in storage growth and in throughput of inserts) and does multi-datacenter installations well.
cons: These will vary by use case, and will be addressed in a further post. A generalization is that MongoDB is easier to setup and manage in smaller environments or for companies constrained by resources.
Content Management:
Schema evolution is a huge win in the content management field. 10gen’s production deployments show numerous successful use cases. MongoDB’s rich querying capabilities and the document store’s ability to support hierarchical models are all perfect for a CMS. For a great real world example, see the extensive documentation on the Wordnik deployment, and how they manage 1.2TB of data across five billion records.
other solutions:
- MySQL or PostgreSQL w/caching and read distribution
pros: Skillsets are more readily available, and can support existing tools for managing the RDBMS infrastructure. Reads are easily scaled in known patterns in the relational database world.
cons: Schema modifications can prove expensive and hard to integrate into publishing workflows. Failover is not automatic.
Online Gaming:
MongoDB works very well with small reads and writes, assuming you manage to keep your working set in RAM. Compounding that with ease of schema evolution and the replica set functionality for easy read scale and failover creates a solid case to investigate MongoDB. In fact, this rule can be pushed out to any situation where writes can grow out of hand for a single database instance. When you find yourself needing to support growth of writes, those writes being small and numerous, you need to ask if you want to a) design the writes away (easier said than done) b) functionally partition the workload or c) shard. MongoDB’s autosharding is nice, though not perfect – and there are gotchas PalominoDB can assist you with. Depending on other variables mentioned earlier, MongoDB might be a solid fit for this part of your workload. Disney’s Interactive Media Group offered a great presentation at the annual MongoSV conference on how they use MongoDB in their environment.
other solutions:
- MySQL or PostgreSQL with sharding
pros: Skillsets are more readibly available, and can support existing tools for managing the RDBMS infrastructure.
cons: Games require significant schema modifications in early iterations, and these can prove expensive and impactful. Extensive development and QA hours and increased complexity come hand in hand with writing your own sharding.
- Cassandra
pros: Skillsets are more readibly available, and can support existing tools for managing the RDBMS infrastructure. Reads are easily scaled in known patterns in the relational database world.
cons: These will vary by use case, and will be addressed in a further post. A generalization is that MongoDB is easier to setup and manage in smaller environments or for companies constrained by resources.
Log Centralization:
Asynchronous (and speedy!) writes, capped collections for space management and the flexibility of a schemaless database (since attributes in logs tend to pop up like mushrooms after a rainstorm) are often cited as key benefits for using MongoDB to store logs. Admittedly, one could build a queue to push data asynchronously into an RDBMS, and partitioning plus a few scripts could duplicate the space management features of a capped collection.
MongoDB and Cassandra both do this well. Cassandra is arguably the better choice for massive scale and works well in a multi-datacenter environment. However, MongoDB is much easier to use, manage and query. The size of the client, the skill-set on hand and the availability needs will help us here.
other solutions:
- Percona’s XtraDB w/socket handler and MySQL partitioning, XML datatypes
pros: MySQL familiarity, reuse of MySQL infrastructure (backups, monitoring etc…)
cons: Socket Handler is still somewhat buggy, partitioning requires scripts and XML is not easily queried.
- Cassandra w/TTL data
pros: multi-datacenter support makes this more viable than MongoDB.
cons: These will vary by use case, and will be addressed in a further post. A generalization is that MongoDB is easier to setup and manage in smaller environments or for companies constrained by resources.
Queue Implementation:
MongoDB implements its internal replication using tailable cursors and capped collections, and the same features are useful to build simple persistent network-based queuing (distributed message-passing) implementations rather than using a “proper” queueing package. One such implementation is described in detail on Captain Codeman’s Blog. Building your own queueing mechanism on top of a DBMS can be suboptimal, but one organization did so because they already had MongoDB expertise on-staff and had difficult performance problems with ActiveMQ.
other solutions:
- ActiveMQ, RabbitMQ.
pros: proper queueing solutions. Known and documented problems or solutions.
cons: more brittle, more difficult to set up, and less performant if your queueing needs are extremely simple.
In summary, MongoDB, either on its own or in a hybrid environment with other technologies, is a wonderful choice for many of the most common use cases and business challenges our clients face. Helping clients make these complex decisions, and working together on the installation, management and optimization of these tools is our core business, and we encourage you to get in touch if you’d like to explore using MongoDB in your own environment.
Originally posted by Laine Campbell at https://www.palominodb.com/blog/2012/03/06/when-mongodb-right-choice-your-business-we-explore-detailed-use-cases
No comments