Step by Step Monitoring Cassandra with Prometheus and Grafana

Posted in: Cassandra, Open Source, Technical Track

In this blog, I’m going to give a detailed guide on how to monitor a Cassandra cluster with Prometheus and Grafana.

For this, I’m using a new VM which I’m going to call “Monitor VM”. In this blog post, I’m going to work on how to install the tools. In a second one, I’m going to go through the details on how to do use and configure Grafana dashboards to get the most out of your monitoring!

High level plan

Monitor VM

  1. Install Prometheus
  2. Configure Prometheus
  3. Install Grafana

Cassandra VMs

  1. Download prometheus JMX-Exporter
  2. Configure JMX-Exporter
  3. Configure Cassandra
  4. Restart Cassandra

Detailed Plan

Monitor VM

Step 1. Install Prometheus

  $ wget https://github.com/prometheus/prometheus/releases/download/v2.3.1/prometheus-2.3.1.linux-amd64.tar.gz
  $ tar xvfz prometheus-*.tar.gz
  $ cd prometheus-*

Step 2. Configure Prometheus

  	$ vim /etc/prometheus/prometheus.yaml
  global:
    scrape_interval: 15s

  scrape_configs:
  # Cassandra config
    - job_name: 'cassandra'
      scrape_interval: 15s
      static_configs:
        - targets: ['cassandra01:7070', 'cassandra02:7070', 'cassandra03:7070']

Step 3. Create storage and start Prometheus

  $ mkdir /data
  $ chown prometheus:prometheus /data
  $ prometheus --config.file=/etc/prometheus/prometheus.yaml

Step 4. Install Grafana

  $ wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana_5.1.4_amd64.deb
  $ sudo apt-get install -y adduser libfontconfig
  $ sudo dpkg -i grafana_5.1.4_amd64.deb

Step 5. Start Grafana

  $ sudo service grafana-server start

Cassandra Nodes

Step 1. Download JMX-Exporter:

  $ mkdir /opt/jmx_prometheus
  $ wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.0/jmx_prometheus_javaagent-0.3.0.jar

Step 2. Configure JMX-Exporter

  $ vim /opt/jmx_prometheus/cassandra.yml
  lowercaseOutputName: true
  lowercaseOutputLabelNames: true
  whitelistObjectNames: [
  "org.apache.cassandra.metrics:type=ColumnFamily,name=RangeLatency,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=LiveSSTableCount,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=SSTablesPerReadHistogram,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=SpeculativeRetries,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOnHeapSize,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableSwitchCount,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableLiveDataSize,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableColumnsCount,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOffHeapSize,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalsePositives,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalseRatio,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterDiskSpaceUsed,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterOffHeapMemoryUsed,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize,*",
  "org.apache.cassandra.metrics:type=ColumnFamily,name=TotalDiskSpaceUsed,*",
  "org.apache.cassandra.metrics:type=CQL,name=RegularStatementsExecuted,*",
  "org.apache.cassandra.metrics:type=CQL,name=PreparedStatementsExecuted,*",
  "org.apache.cassandra.metrics:type=Compaction,name=PendingTasks,*",
  "org.apache.cassandra.metrics:type=Compaction,name=CompletedTasks,*",
  "org.apache.cassandra.metrics:type=Compaction,name=BytesCompacted,*",
  "org.apache.cassandra.metrics:type=Compaction,name=TotalCompactionsCompleted,*",
  "org.apache.cassandra.metrics:type=ClientRequest,name=Latency,*",
  "org.apache.cassandra.metrics:type=ClientRequest,name=Unavailables,*",
  "org.apache.cassandra.metrics:type=ClientRequest,name=Timeouts,*",
  "org.apache.cassandra.metrics:type=Storage,name=Exceptions,*",
  "org.apache.cassandra.metrics:type=Storage,name=TotalHints,*",
  "org.apache.cassandra.metrics:type=Storage,name=TotalHintsInProgress,*",
  "org.apache.cassandra.metrics:type=Storage,name=Load,*",
  "org.apache.cassandra.metrics:type=Connection,name=TotalTimeouts,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=CompletedTasks,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=PendingTasks,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=ActiveTasks,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=TotalBlockedTasks,*",
  "org.apache.cassandra.metrics:type=ThreadPools,name=CurrentlyBlockedTasks,*",
  "org.apache.cassandra.metrics:type=DroppedMessage,name=Dropped,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=HitRate,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Hits,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Requests,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Entries,*",
  "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Size,*",
  "org.apache.cassandra.metrics:type=Client,name=connectedNativeClients,*",
  "org.apache.cassandra.metrics:type=Client,name=connectedThriftClients,*",
  "org.apache.cassandra.metrics:type=Table,name=WriteLatency,*",
  "org.apache.cassandra.metrics:type=Table,name=ReadLatency,*",
  "org.apache.cassandra.net:type=FailureDetector,*",
  ]
  rules:
    - pattern: org.apache.cassandra.metrics<type=(Connection|Streaming), scope=(\S*), name=(\S*)><>(Count|Value)
      name: cassandra_$1_$3
      labels:
        address: "$2"
    - pattern: org.apache.cassandra.metrics<type=(ColumnFamily), name=(RangeLatency)><>(Mean)
      name: cassandra_$1_$2_$3
    - pattern: org.apache.cassandra.net<type=(FailureDetector)><>(DownEndpointCount)
      name: cassandra_$1_$2
    - pattern: org.apache.cassandra.metrics<type=(Keyspace), keyspace=(\S*), name=(\S*)><>(Count|Mean|95thPercentile)
      name: cassandra_$1_$3_$4
      labels:
        "$1": "$2"
    - pattern: org.apache.cassandra.metrics<type=(Table), keyspace=(\S*), scope=(\S*), name=(\S*)><>(Count|Mean|95thPercentile)
      name: cassandra_$1_$4_$5
      labels:
        "keyspace": "$2"
        "table": "$3"
    - pattern: org.apache.cassandra.metrics<type=(ClientRequest), scope=(\S*), name=(\S*)><>(Count|Mean|95thPercentile)
      name: cassandra_$1_$3_$4
      labels:
        "type": "$2"
    - pattern: org.apache.cassandra.metrics<type=(\S*)(?:, ((?!scope)\S*)=(\S*))?(?:, scope=(\S*))?,
        name=(\S*)><>(Count|Value)
      name: cassandra_$1_$5
      labels:
        "$1": "$4"
        "$2": "$3"

Step 3. Configure Cassandra

  echo 'JVM_OPTS="$JVM_OPTS -javaagent:/opt/prometheus-exporter/jmx_prometheus_javaagent-0.3.0.jar=7070:/opt/prometheus-exporter/cassandra.yaml"' >> conf/cassandra-env.sh

Step 4. Restart Cassandra

  $ nodetool flush
  $ nodetool drain
  $ sudo service cassandra restart

And now, if you have no errors (and you shouldn’t!) your Prometheus is ingesting your Cassandra metrics!

Wait for the next blog post where I will guide you through a good Grafana configuration!

email

Interested in working with Carlos? Schedule a tech call.

About the Author

Carlos Rolo is a Datastax Certified Cassandra Architect, and has deep expertise with distributed architecture technologies. Carlos is driven by challenge, and enjoys the opportunities to discover new things and new ways of learning that come with working at Pythian. He has become known and trusted by customers and colleagues for his ability to understand complex problems, and to work well under pressure. He prides himself on being a tenacious problem solver, while remaining a calm and positive presence on any team. When Carlos isn’t working he can be found playing water polo or enjoying the his local community. Carlos holds a Bachelor of Electro-technical Engineering, and a Master of Control Systems and Automation.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *