A deeper look at consensus
CockroachDB uses the Raft consensus algorithm to guarantee data consistency (as long as system clocks are synchronized with NTP and clock offset is bounded), but it does not handle the whole data set in as a single Raft group. Instead, it uses Multi-Raft, and has a group for each range (horizontal scaling is achieved by splitting data into ranges). Each group designates a node as Leaseholder, which is the node that accepts writes and can serve reads. To optimize reads, Raft is bypassed for them (simplifying that is because the Leaseholder is the only node in a group accepting writes, so it has the most up-to-date data).
There are excellent online resources to get more information about consensus and consistency guarantees in CockroachDB and here are some starting points:
- Jepsen’s test results, both in Kyle Kingsbury’s post describing them and in Cockroach Labs’ blog.
- How CockroachDB does distributed, atomic transactions.
- Living without atomic clocks.
- Consensus, made thrive.
- The Secret Life of Data’s interactive explanation of Raft.
Online schema changes
Being SQL-based, CockroachDB organizes data into databases, tables and rows, with tables having a predefined and well-known schema. Since one of the features of this database is horizontal scaling, this means schema changes must be simple to make even in a globally distributed cluster.
CockroachDB derives inspiration for this from Google’s F1 and supports online schema change by allowing multiple versions of a schema to coexist on the cluster at the same time. A very good explanation of this process can be found here.
Locality and replication zones
During the podcast, we mentioned Georeplication and even touched on GDPR compliance. The feature that lets you do this is Replication Zones, which also control the number of copies of replicas. Another compelling use case for this feature is cross-cloud migrations. In this case, you may want to migrate from one cloud provider to another, and CockroachDB lets you achieve this by appropriately configuring the locality of Replication Zones, provisioning nodes on the target provider, and then adjusting the configuration to force all data to reside only on the locality that corresponds to this new provider. An example migration describing this procedure can be found on this post.
Follow the workload
CockroachDB leverages Leaseholders and Replication Zones (more specifically, the –locality flag) to move active range leases to a node closer to the origin of the majority of the workload automatically when there is high latency between the nodes.
Going back to what we said while discussing consensus, the Leaseholder is the only node in a Raft group that can accept writes and serve reads. Hence, to ‘move active range leases’ means that when locality is in use, CockroachDB attempts to keep Leaseholders as close to the sources of traffic for ranges as possible.
Are you using or considering to use CockroachDB in production? If yes, I’d love to read your stories in the comments section!