Batch Operations in Apache Cassandra

Batches are one of the most misunderstood features of Apache Cassandra. They rarely improve performance. In fact, while using batches, performance may degrade. To set the stage, let’s take a look at how Cassandra handles individual mutations.   Individual mutations…

Read More >

How to Deploy Spark in DataStax Cassandra 5.1

spark datastax cassandra

Spark is an open-source, distributed processing system used to manage big data workloads. Spark uses in-memory caching and optimized query execution for fast analytic queries against any data size. Simply put, Spark is used to process data on a very…

Read More >

Change Your system_auth Replication Factor in Cassandra

Cassandra authentication

Occasionally, clients reach out to us with authentication issues when a node is down. While this scenario shouldn’t happen in a high availability database management system (DBMS), it can if you miss a couple of very relevant lines in the…

Read More >

Cassandra for Beginners: Replication

cassandra for beginners

This post is the continuation of the previous post, Cassandra 101: Understanding What Cassandra Is, in which I’ll highlight a series of topics related to Cassandra for beginners.       Replication Factor The replication factor in Cassandra can be…

Read More >

Replacing Nodes in Cassandra

nodes in cassandra

One of the many things to love about Cassandra is how operationally simple it is to add, remove or even replace nodes in a cluster.     Replacing a node in Cassandra is as easy as setting your configuration files…

Read More >

How to Perform (UDC) User-Defined Compactions in Cassandra

User-defined compactions allow us to manually select which files should be compacted. This enables us to reclaim space and limit the size of compaction so it can fit into the remaining space. These compactions are relevant only for SizeTieredCompactionStrategy (STCS)…

Read More >

Let’s Deal with High Read Latencies in Cassandra

High latency values may indicate a cluster at the edge of its processing capacity, issues with the data model—such as poor choice of partition key or high levels of tombstones—or issues with the underlying infrastructure. Below are some major reasons…

Read More >

Incremental Repair: Problems and a Solution

Because incremental repairs can significantly reduce the time and IO cost of performing a repair, they can seem like a great idea. However practical implementation carries a few pitfalls which can cause severe damage to a production cluster, especially when…

Read More >

Upgrading a Large Cassandra Cluster with cstar

I recently did an upgrade of 200+ nodes of Cassandra across multiple environments sitting behind multiple applications using the cstar tool. We chose the cstar tool because, out of all automation options, it has topology awareness specifically to Cassandra. Here…

Read More >

Spark + Cassandra Best Practices

Spark Overview Spark was created in 2009 as a response to difficulties with map-reduce in Hadoop, particularly in supporting machine learning and other interactive data analysis. Spark simplifies the processing and analysis of data, reducing the number of steps and…

Read More >
Page 1 of 712345...Last Page »