Apache Cassandra promises linear scalability and workload distribution, among many other features—and rightly so. However, as with many good things in life, these benefits come with a set of upfront conditions. When the use case aligns with the architectural limitations,…
Read More >Batches are one of the most misunderstood features of Apache Cassandra. They rarely improve performance. In fact, while using batches, performance may degrade. To set the stage, let’s take a look at how Cassandra handles individual mutations. Individual mutations…
Read More >Spark is an open-source, distributed processing system used to manage big data workloads. Spark uses in-memory caching and optimized query execution for fast analytic queries against any data size. Simply put, Spark is used to process data on a very…
Read More >Occasionally, clients reach out to us with authentication issues when a node is down. While this scenario shouldn’t happen in a high availability database management system (DBMS), it can if you miss a couple of very relevant lines in the…
Read More >This post is the continuation of the previous post, Cassandra 101: Understanding What Cassandra Is, in which I’ll highlight a series of topics related to Cassandra for beginners. Replication Factor The replication factor in Cassandra can be…
Read More >One of the many things to love about Cassandra is how operationally simple it is to add, remove or even replace nodes in a cluster. Replacing a node in Cassandra is as easy as setting your configuration files…
Read More >User-defined compactions allow us to manually select which files should be compacted. This enables us to reclaim space and limit the size of compaction so it can fit into the remaining space. These compactions are relevant only for SizeTieredCompactionStrategy (STCS)…
Read More >High latency values may indicate a cluster at the edge of its processing capacity, issues with the data model—such as poor choice of partition key or high levels of tombstones—or issues with the underlying infrastructure. Below are some major reasons…
Read More >Because incremental repairs can significantly reduce the time and IO cost of performing a repair, they can seem like a great idea. However practical implementation carries a few pitfalls which can cause severe damage to a production cluster, especially when…
Read More >I recently did an upgrade of 200+ nodes of Cassandra across multiple environments sitting behind multiple applications using the cstar tool. We chose the cstar tool because, out of all automation options, it has topology awareness specifically to Cassandra. Here…
Read More >