I recently completed a project where I worked with a company using Cassandra to keep metadata about objects stored in an Object Store. The application keeps track of individual objects as rows within a partition based on user id. In an effort to save space there is also a mechanism to track duplicate references to the objects in another table. Object writes take place as background activity and the time it takes to complete those writes is invisible to the applications end users. The time it takes to retrieve an object though is very visible to the end user. The keyspace was defined as network topology using two data centers (actual data centers here about 50 ms apart) with replication factor 3 in both data centers.
Initially the application was set up to use consistency ONE for both writes and reads. This seemed to be working okay until we started doing failure testing. At which point objects would come up missing due to the delay time in pushing hinted hand offs from one node to another. A simple solution to this was to make all writes and reads LOCAL_QUORUM. In fact doing so did resolve pretty much all of the testing errors but at a much increased latency, about 3 times longer than with consistency ONE, on both reads and writes. Even so, the latencies were deemed to be acceptable since they were still well under anticipated network latencies outside of the data centers which is what the users would be seeing.
Could we have done better than that though?
The writes are a background activity not visible to the end user. The increased write latency is probably reasonable there. The read latency is visible to the user. There is an option which guarantees finding the stored object references while still keeping the latency to a minimum. This is what I propose, the default read consistency is set back to ONE and most of the time a read to Cassandra will find the object reference as was clear in the initial testing. But, if a read returns no object reference then a second read is issued using LOCAL_QUORUM. This way most, more than 99%, of all reads are satisfied with the much lower latency consistency of ONE only occasionally needing the second read. This can be extended further to a full QUORUM read if the LOCAL_QUORUM read fails.
It is important to note that this approach only works if there are no row versions. E.G. rows only exist or do not exist. If a row may have different versions over time as you might have if the row were updated rather than just inserted and then later deleted. It is also important to note that its possible to find a deleted row this way. For this use case these qualifications are not issues.
Discover more about our expertise in Cassandra.
Interested in working with John? Schedule a tech call.