Step-by-bstep monitoring Cassandra with with Prometheus and Grafana

Posted in: Cassandra, Open Source, Technical Track

In this blog, I’m going to give a detailed guide on how to monitor a Cassandra cluster with Prometheus and Grafana.

For this, I’m using a new VM which I’m going to call “Monitor VM”. In this blog post, I’m going to work on how to install the tools. In a second one, I’m going to go through the details on how to do use and configure Grafana dashboards to get the most out of your monitoring!

High level plan

Monitor VM

  1. Install Prometheus
  2. Configure Prometheus
  3. Install Grafana

Cassandra VMs

  1. Download prometheus JMX-Exporter
  2. Configure JMX-Exporter
  3. Configure Cassandra
  4. Restart Cassandra

Detailed Plan

Monitor VM

Step 1. Install Prometheus

  $ wget https://github.com/prometheus/prometheus/releases/download/v2.3.1/prometheus-2.3.1.linux-amd64.tar.gz
  $ tar xvfz prometheus-*.tar.gz
  $ cd prometheus-*

Step 2. Configure Prometheus

  	$ vim /etc/prometheus/prometheus.yaml
  global:
    scrape_interval: 15s
  scrape_configs:
  # Cassandra config
    - job_name: 'cassandra'
      scrape_interval: 15s
      static_configs:
        - targets: ['cassandra01:7070', 'cassandra02:7070', 'cassandra03:7070']

Step 3. Create storage and start Prometheus

  $ mkdir /data
  $ chown prometheus:prometheus /data
  $ prometheus --config.file=/etc/prometheus/prometheus.yaml

Step 4. Install Grafana

  $ wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana_5.1.4_amd64.deb
  $ sudo apt-get install -y adduser libfontconfig
  $ sudo dpkg -i grafana_5.1.4_amd64.deb

Step 5. Start Grafana

  $ sudo service grafana-server start

Cassandra nodes

Step 1. Download JMX-Exporter:

  $ mkdir /opt/jmx_prometheus
  $ wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.0/jmx_prometheus_javaagent-0.3.0.jar

Step 2. Configure JMX-Exporter

  $ vim /opt/jmx_prometheus/cassandra.yml
  lowercaseOutputName: true
  lowercaseOutputLabelNames: true
  whitelistObjectNames: [
  "org.apache.cassandra.metrics:type=ColumnFamily,name=RangeLatency,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=LiveSSTableCount,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=SSTablesPerReadHistogram,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=SpeculativeRetries,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOnHeapSize,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableSwitchCount,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableLiveDataSize,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableColumnsCount,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOffHeapSize,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalsePositives,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalseRatio,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterDiskSpaceUsed,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterOffHeapMemoryUsed,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=TotalDiskSpaceUsed,*",
  "org.apache.cassandra.metrics:type=CQL,name=RegularStatementsExecuted,*",
  "org.apache.cassandra.metrics:type=CQL,name=PreparedStatementsExecuted,*",
  "org.apache.cassandra.metrics:type=Compaction,name=PendingTasks,*",
  "org.apache.cassandra.metrics:type=Compaction,name=CompletedTasks,*",
  "org.apache.cassandra.metrics:type=Compaction,name=BytesCompacted,*",
  "org.apache.cassandra.metrics:type=Compaction,name=TotalCompactionsCompleted,*",
  "org.apache.cassandra.metrics:type=ClientRequest,name=Latency,*",
  "org.apache.cassandra.metrics:type=ClientRequest,name=Unavailables,*",
  "org.apache.cassandra.metrics:type=ClientRequest,name=Timeouts,*",
  "org.apache.cassandra.metrics:type=Storage,name=Exceptions,*",
  "org.apache.cassandra.metrics:type=Storage,name=TotalHints,*",
  "org.apache.cassandra.metrics:type=Storage,name=TotalHintsInProgress,*",
  "org.apache.cassandra.metrics:type=Storage,name=Load,*",
  "org.apache.cassandra.metrics:type=Connection,name=TotalTimeouts,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=CompletedTasks,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=PendingTasks,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=ActiveTasks,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=TotalBlockedTasks,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=CurrentlyBlockedTasks,*",
  "org.apache.cassandra.metrics:type=DroppedMessage,name=Dropped,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=HitRate,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Hits,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Requests,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Entries,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Size,*",
  "org.apache.cassandra.metrics:type=Client,name=connectedNativeClients,*",
  "org.apache.cassandra.metrics:type=Client,name=connectedThriftClients,*",
  "org.apache.cassandra.metrics:type=Table,name=WriteLatency,*",
  "org.apache.cassandra.metrics:type=Table,name=ReadLatency,*",
  "org.apache.cassandra.net:type=FailureDetector,*",
  ]
  rules:
    - pattern: org.apache.cassandra.metrics<type=(Connection|Streaming), scope=(\S*), name=(\S*)><>(Count|Value)
      name: cassandra_$1_$3
      labels:
        address: "$2"
    - pattern: org.apache.cassandra.metrics<type=(ColumnFamily), name=(RangeLatency)><>(Mean)
      name: cassandra_$1_$2_$3
    - pattern: org.apache.cassandra.net<type=(FailureDetector)><>(DownEndpointCount)
      name: cassandra_$1_$2
    - pattern: org.apache.cassandra.metrics<type=(Keyspace), keyspace=(\S*), name=(\S*)><>(Count|Mean|95thPercentile)
      name: cassandra_$1_$3_$4
      labels:
        "$1": "$2"
    - pattern: org.apache.cassandra.metrics<type=(Table), keyspace=(\S*), scope=(\S*), name=(\S*)><>(Count|Mean|95thPercentile)
      name: cassandra_$1_$4_$5
      labels:
        "keyspace": "$2"
        "table": "$3"
    - pattern: org.apache.cassandra.metrics<type=(ClientRequest), scope=(\S*), name=(\S*)><>(Count|Mean|95thPercentile)
      name: cassandra_$1_$3_$4
      labels:
        "type": "$2"
    - pattern: org.apache.cassandra.metrics<type=(\S*)(?:, ((?!scope)\S*)=(\S*))?(?:, scope=(\S*))?,
        name=(\S*)><>(Count|Value)
      name: cassandra_$1_$5
      labels:
        "$1": "$4"
        "$2": "$3"

Step 3. Configure Cassandra

  echo 'JVM_OPTS="$JVM_OPTS -javaagent:/opt/prometheus-exporter/jmx_prometheus_javaagent-0.3.0.jar=7070:/opt/prometheus-exporter/cassandra.yaml"' >> conf/cassandra-env.sh

Step 4. Restart Cassandra

  $ nodetool flush
  $ nodetool drain
  $ sudo service cassandra restart

And now, if you have no errors (and you shouldn’t!) your Prometheus is ingesting your Cassandra metrics!

Wait for the next blog post where I will guide you through a good Grafana configuration!

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Carlos Rolo is a Datastax Certified Cassandra Architect, and has deep expertise with distributed architecture technologies. Carlos is driven by challenge, and enjoys the opportunities to discover new things and new ways of learning that come with working at Pythian. He has become known and trusted by customers and colleagues for his ability to understand complex problems, and to work well under pressure. He prides himself on being a tenacious problem solver, while remaining a calm and positive presence on any team. When Carlos isn’t working he can be found playing water polo or enjoying the his local community. Carlos holds a Bachelor of Electro-technical Engineering, and a Master of Control Systems and Automation.

14 Comments. Leave new

Hi Carlos,
Nice article, I want to configure same in windows machine, Please help me out

Reply

Hi Carlos,

Thanks, My requirement is I have created Streaming pipeline from Oracle to cassandra. Is there any possible ways to monitor Both Table level daily counts using this approach.

Reply

@Sankar the Windows approach should be more straightforward. Just copy the configurations and start the applications where you have them extracted.

@Venkat, Counting in Cassandra is a really, really trick thing. You could use this approach to monitor the writes, but I would take it with a grain of salt. I might do a blog about that, is a common problem!

Reply

Hi Carlos,

nice article. Do you think is possible to monitor Cassandra DSE using Azure?
Is it possible to export the metrics to Azure Log Analytics or Application Insights?
Thanks

Reply

Hello everyone…
I need this configuration for Cassandra monitoring with grafana dashboards . Please help me on this
Thanks in advance

Reply

Hi , I need dashboards for this configuration. Please help me on this. Thanks in advance

Reply
Marc Richter
July 25, 2019 11:23 am

Hi Carlos,

thanks for this nice and easy to follow article.
I hope that “next blog post where I will guide you through a good Grafana configuration” will come soon, since this is where I’m stuck now ;-)

BR,
Marc

Reply

Hello,

Thanks for the details.

May you please help me understand the below rule

rules:
– pattern: org.apache.cassandra.metrics<type=(Connection|Streaming), scope=(\S*), name=(\S*)><>(Count|Value)
name: cassandra_$1_$3
labels:
address: “$2”

How is this helping . I understand this is renaming the metrics but may you please elaborate on this.

Thanks

Reply
Delsaran Bigglesworth
October 14, 2019 7:40 am

><> I think is the website attempting to convert angle brackets, this part of the config was broken for me so I just removed it. You might have better luck with the config on this page.

https://grafana.com/grafana/dashboards/5408

Reply
Delsaran Bigglesworth
October 14, 2019 7:43 am

Yeah. (\S*)><>(Count|Value) Should be >

For whatever reason it converted it

Reply
Mootez Bessifi
November 13, 2019 5:46 am

Hi Carlos,

where is the grafana config ?

Reply

Hi Carlos,
Very well documented, Thanks.
Awaiting grafana dashboard.

Reply
Andres Ackerman
March 5, 2020 9:26 am

Hi Carlos, nice post. I have my 3 cassandra nodes with 150gb each. I dont understand why you do at the end of the process:
nodetool flush
nodetool drain
Is that necessary? I dont want to flush all my data or write sstables to hard drive? What happens if I dont do that and just restart the cassandra service? Thanks

Reply

Hi Carlos,
I appreciate your effort . Nice Article.
Have a quick question.
I want to monitor the health of my cassandra cluster to know whether the endpoints are UP or DOWN.
Could you please suggest me the metric name or option I should explore for my requirement ?
Thank you in advance.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *