Step-by-bstep monitoring Cassandra with with Prometheus and Grafana

Posted in: Cassandra, Open Source, Technical Track

In this blog, I’m going to give a detailed guide on how to monitor a Cassandra cluster with Prometheus and Grafana.

For this, I’m using a new VM which I’m going to call “Monitor VM”. In this blog post, I’m going to work on how to install the tools. In a second one, I’m going to go through the details on how to do use and configure Grafana dashboards to get the most out of your monitoring!

High level plan

Monitor VM

  1. Install Prometheus
  2. Configure Prometheus
  3. Install Grafana

Cassandra VMs

  1. Download prometheus JMX-Exporter
  2. Configure JMX-Exporter
  3. Configure Cassandra
  4. Restart Cassandra

Detailed Plan

Monitor VM

Step 1. Install Prometheus

  $ wget https://github.com/prometheus/prometheus/releases/download/v2.3.1/prometheus-2.3.1.linux-amd64.tar.gz
  $ tar xvfz prometheus-*.tar.gz
  $ cd prometheus-*

Step 2. Configure Prometheus

  	$ vim /etc/prometheus/prometheus.yaml
  global:
    scrape_interval: 15s

  scrape_configs:
  # Cassandra config
    - job_name: 'cassandra'
      scrape_interval: 15s
      static_configs:
        - targets: ['cassandra01:7070', 'cassandra02:7070', 'cassandra03:7070']

Step 3. Create storage and start Prometheus

  $ mkdir /data
  $ chown prometheus:prometheus /data
  $ prometheus --config.file=/etc/prometheus/prometheus.yaml

Step 4. Install Grafana

  $ wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana_5.1.4_amd64.deb
  $ sudo apt-get install -y adduser libfontconfig
  $ sudo dpkg -i grafana_5.1.4_amd64.deb

Step 5. Start Grafana

  $ sudo service grafana-server start

Cassandra nodes

Step 1. Download JMX-Exporter:

  $ mkdir /opt/jmx_prometheus
  $ wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.0/jmx_prometheus_javaagent-0.3.0.jar

Step 2. Configure JMX-Exporter

  $ vim /opt/jmx_prometheus/cassandra.yml
  lowercaseOutputName: true
  lowercaseOutputLabelNames: true
  whitelistObjectNames: [
  "org.apache.cassandra.metrics:type=ColumnFamily,name=RangeLatency,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=LiveSSTableCount,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=SSTablesPerReadHistogram,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=SpeculativeRetries,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOnHeapSize,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableSwitchCount,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableLiveDataSize,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableColumnsCount,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOffHeapSize,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalsePositives,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalseRatio,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterDiskSpaceUsed,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterOffHeapMemoryUsed,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=TotalDiskSpaceUsed,*",
  "org.apache.cassandra.metrics:type=CQL,name=RegularStatementsExecuted,*",
  "org.apache.cassandra.metrics:type=CQL,name=PreparedStatementsExecuted,*",
  "org.apache.cassandra.metrics:type=Compaction,name=PendingTasks,*",
  "org.apache.cassandra.metrics:type=Compaction,name=CompletedTasks,*",
  "org.apache.cassandra.metrics:type=Compaction,name=BytesCompacted,*",
  "org.apache.cassandra.metrics:type=Compaction,name=TotalCompactionsCompleted,*",
  "org.apache.cassandra.metrics:type=ClientRequest,name=Latency,*",
  "org.apache.cassandra.metrics:type=ClientRequest,name=Unavailables,*",
  "org.apache.cassandra.metrics:type=ClientRequest,name=Timeouts,*",
  "org.apache.cassandra.metrics:type=Storage,name=Exceptions,*",
  "org.apache.cassandra.metrics:type=Storage,name=TotalHints,*",
  "org.apache.cassandra.metrics:type=Storage,name=TotalHintsInProgress,*",
  "org.apache.cassandra.metrics:type=Storage,name=Load,*",
  "org.apache.cassandra.metrics:type=Connection,name=TotalTimeouts,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=CompletedTasks,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=PendingTasks,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=ActiveTasks,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=TotalBlockedTasks,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=CurrentlyBlockedTasks,*",
  "org.apache.cassandra.metrics:type=DroppedMessage,name=Dropped,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=HitRate,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Hits,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Requests,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Entries,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Size,*",
  "org.apache.cassandra.metrics:type=Client,name=connectedNativeClients,*",
  "org.apache.cassandra.metrics:type=Client,name=connectedThriftClients,*",
  "org.apache.cassandra.metrics:type=Table,name=WriteLatency,*",
  "org.apache.cassandra.metrics:type=Table,name=ReadLatency,*",
  "org.apache.cassandra.net:type=FailureDetector,*",
  ]
  rules:
    - pattern: org.apache.cassandra.metrics<type=(Connection|Streaming), scope=(\S*), name=(\S*)><>(Count|Value)
      name: cassandra_$1_$3
      labels:
        address: "$2"
    - pattern: org.apache.cassandra.metrics<type=(ColumnFamily), name=(RangeLatency)><>(Mean)
      name: cassandra_$1_$2_$3
    - pattern: org.apache.cassandra.net<type=(FailureDetector)><>(DownEndpointCount)
      name: cassandra_$1_$2
    - pattern: org.apache.cassandra.metrics<type=(Keyspace), keyspace=(\S*), name=(\S*)><>(Count|Mean|95thPercentile)
      name: cassandra_$1_$3_$4
      labels:
        "$1": "$2"
    - pattern: org.apache.cassandra.metrics<type=(Table), keyspace=(\S*), scope=(\S*), name=(\S*)><>(Count|Mean|95thPercentile)
      name: cassandra_$1_$4_$5
      labels:
        "keyspace": "$2"
        "table": "$3"
    - pattern: org.apache.cassandra.metrics<type=(ClientRequest), scope=(\S*), name=(\S*)><>(Count|Mean|95thPercentile)
      name: cassandra_$1_$3_$4
      labels:
        "type": "$2"
    - pattern: org.apache.cassandra.metrics<type=(\S*)(?:, ((?!scope)\S*)=(\S*))?(?:, scope=(\S*))?,
        name=(\S*)><>(Count|Value)
      name: cassandra_$1_$5
      labels:
        "$1": "$4"
        "$2": "$3"

Step 3. Configure Cassandra

  echo 'JVM_OPTS="$JVM_OPTS -javaagent:/opt/prometheus-exporter/jmx_prometheus_javaagent-0.3.0.jar=7070:/opt/prometheus-exporter/cassandra.yaml"' >> conf/cassandra-env.sh

Step 4. Restart Cassandra

  $ nodetool flush
  $ nodetool drain
  $ sudo service cassandra restart

And now, if you have no errors (and you shouldn’t!) your Prometheus is ingesting your Cassandra metrics!

Wait for the next blog post where I will guide you through a good Grafana configuration!

email

Interested in working with Carlos? Schedule a tech call.

About the Author

Carlos Rolo is a Datastax Certified Cassandra Architect, and has deep expertise with distributed architecture technologies. Carlos is driven by challenge, and enjoys the opportunities to discover new things and new ways of learning that come with working at Pythian. He has become known and trusted by customers and colleagues for his ability to understand complex problems, and to work well under pressure. He prides himself on being a tenacious problem solver, while remaining a calm and positive presence on any team. When Carlos isn’t working he can be found playing water polo or enjoying the his local community. Carlos holds a Bachelor of Electro-technical Engineering, and a Master of Control Systems and Automation.

7 Comments. Leave new

Hi Carlos,
Nice article, I want to configure same in windows machine, Please help me out

Reply

Hi Carlos,

Thanks, My requirement is I have created Streaming pipeline from Oracle to cassandra. Is there any possible ways to monitor Both Table level daily counts using this approach.

Reply

@Sankar the Windows approach should be more straightforward. Just copy the configurations and start the applications where you have them extracted.

@Venkat, Counting in Cassandra is a really, really trick thing. You could use this approach to monitor the writes, but I would take it with a grain of salt. I might do a blog about that, is a common problem!

Reply

Hi Carlos,

nice article. Do you think is possible to monitor Cassandra DSE using Azure?
Is it possible to export the metrics to Azure Log Analytics or Application Insights?
Thanks

Reply

Hello everyone…
I need this configuration for Cassandra monitoring with grafana dashboards . Please help me on this
Thanks in advance

Reply

Hi , I need dashboards for this configuration. Please help me on this. Thanks in advance

Reply
Marc Richter
July 25, 2019 11:23 am

Hi Carlos,

thanks for this nice and easy to follow article.
I hope that “next blog post where I will guide you through a good Grafana configuration” will come soon, since this is where I’m stuck now ;-)

BR,
Marc

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *