As Microsoft’s premier NoSQL cloud offering, Cosmos DB offers some interesting design decisions and trade-offs that are necessary to understand in order to maximize the benefits of the product. In this blog post, we are going to cover indexing as it is one of the features that really illustrates this point.
In a relational database, you create a schema and then define your indexes on top of it. The exact definition, quantity and storage of those indexes can often times have a significant impact on the overall performance of the database.
For the Cosmos DB team, it was important that their service “just worked” right off the bat. This means that you are able to load data and start querying, skipping both the declarative schema creation as well as the index management piece. To achieve this, Cosmos DB indexes all fields by default and starts using them right away as soon as data is loaded into the database.
Indexes also work transparently out of the gate and do not have to be declared explicitly to use them. This is not the case in other competitor products such as Amazon’s DynamoDB where the index to be used has to be declared with the request. This gives Cosmos DB more flexibility since the querying code is completely independent of the underlying implementation, as it should be.
If an administrator does not like this “index all by default” policy then they are also free to change it. This again is the beauty of this design; you don’t have to micromanage indexing, but if you want to, you can do it. This is done in a declarative way through a JSON policy that specifies which paths of the database have to be indexed and how.
The indexing can even be micromanaged to the level of the individual document. For example, you can change indexing from “automatic” to “manual” and then only the documents that you request to be indexed will be added to the index. This is likely overkill for many people but if you need this level of granular control, you have access to it.
Another interesting option provided by Cosmos DB is the ability to specify whether the indexes are updated in two different ways:
- Consistent: changes to the index happen immediately. Query consistency is respected and Request Unit consumption is higher.
- Lazy: changes to the index happen asynchronously through a background process. Query consistency is eventual and Request Unit consumption is lower.
Depending on the application use case, it is up to the developers to decide which one of these two modes is what they require.
There is also a third option and it is simply to set the indexing mode as None. This will make Cosmos drop any indexes and stop indexing writes. If you use this option then the only way to access the documents without scanning the collection would be by recalling the document based on the ID value. This for example is feasible if you want to use Cosmos as a key-value store.
Currently the service supports 3 different index types:
- Hash: useful for equality and inequality predicates.
- Range: useful for ordering and range searches.
- Spatial: useful for spatial queries (within, distance, etc.)
The choice of Hash versus Range is again dependent on the query profile of the application. By default Cosmos will do ranges for numbers and hashes for strings but this can be changed at any time by the developer. Furthermore, you can tweak the policy so that some specific paths use an index type that is more fitting to how they are queried.
The last thing to understand about Cosmos DB is what happens when the indexing policy changes. There are three main characteristics:
- Online: there is no blocking or query throttling while the index is being changed.
- No performance impact: this is a big one, Cosmos DB does not take Request Units to change the indexing policy.
- Consistency: while the index transformation is happening, queries will be eventually consistent.
Let’s jump into the video now, where we will demonstrate the impact that the different consistency settings can have on Cosmos DB performance. Cheers!