How to run OpenTSDB with Google Bigtable

Posted in: Big Data, Cloud, Open Source, Technical Track

In a previous post (OpenTSDB and Google Cloud Bigtable) we discussed OpenTSDB, an open source distributed database specifically designed for storing timeseries data. We also explained how OpenTSDB relies on Apache HBase for a reliable and scalable data backend. However, deployment and administration of an HBase cluster is not a trivial task, as it requires a full Hadoop setup. This means that it takes a big data engineer (or better a team of them) to plan for the cluster sizing, provision the machines and setup the Hadoop nodes, configure all services and tune them for optimal performance. If this is not enough, Operations teams have to constantly monitor the cluster, deal with hardware and service failures, perform upgrades, backup regularly, and a ton of other tasks that make maintenance of a Hadoop cluster and OpenTSDB a challenge for most organizations.

With the release of Google Bigtable as a cloud service and its support for the HBase API, it was obvious that if we managed to integrate OpenTSDB with Google Bigtable, we would enable more teams to have access to the powerful functionality of OpenTSDB by removing the burden from maintaining an HBase cluster.

Nevertheless, integration of OpenTSDB with Bigtable was not as seamless as dropping a few jars in its release directory. This happened because the OpenTSDB developers went over and above the standard HBase libraries, by implementing their very own asynchbase library. Asynchbase is a fully asynchronous, non-blocking, thread-safe, high-performance HBase API. And no one can put it better than the asynchbase developers themselves who claim that ‘This HBase client differs significantly from HBase’s client. Switching to it is not easy as it requires one to rewrite all the code that was interacting with any HBase API.’

This meant that integration with Google Bigtable required OpenTSDB to switch back to the standard HBase API. We saw the value of such an effort here at Pythian and set about developing this solution.

The asyncbigtable library

Today, we are very happy to announce the release of the asyncbigtable library. The asyncbigtable library is a 100% compatible implementation of the great asynchbase library that can be used as a drop in replacement and enable OpenTSDB to use Google Bigtable as a storage backend.

Thanks to support from the OpenTSDB team, the asyncbigtable code is hosted in the OpenTSDB GitHub repository.

Challenges

To create asyncbigtable we had to overcome two great challenges. The first one was that OpenTSDB assumes that the underlying library (until now asynchbase) performs asynchronous and non-blocking operations. On the other hand, the standard HBase API only supports synchronous and blocking calls. As a workaround for this, we used the BufferedMutator  implementation that collects all Mutation operations in a buffer and performs them in batches, allowing for mutations with an extremely low latency.

The second challenge stemmed from the fact that the OpenTSDB project has a very limited set of jar dependencies, that are explicitly defined in Makefiles. Contrary to this spartan approach, HBase and Bigtable client libraries have a significant number of transitive dependencies. Since, adding those dependencies one-by-one in the OpenTSDB build process would complicate its dependency management, we decided to  package all asyncbigtable dependencies in an uber-jar using the Maven assembly plugin. Therefore, building OpenTSDB with asyncbigtable support is now as simple as downloading a single beefy jar.

Build steps

Before you start

Before you build OpenTSDB with Google Bigtable support, you must complete the following required steps:

  1. Create a Google Bigtable cluster (https://cloud.google.com/bigtable/docs/creating-cluster)
  1. Install HBase shell with access to the Google Bigtable cluster (https://cloud.google.com/bigtable/docs/installing-hbase-shell)
  1. Download and install the required tools for compiling OpenTSDB from source (http://opentsdb.net/docs/build/html/installation.html#compiling-from-source)

Build and run OpenTSDB

  1. Clone and build the modified source code from the Pythian github repository:

git clone -b bigtable git@github.com:pythian/opentsdb.git
cd opentsdb
sh build-bigtable.sh

  1. Create OpenTSDB tables

OpenTSDB provides a script that uses HBase shell to create its tables.  To create the tables run the following command:
env COMPRESSION=NONE HBASE_HOME=/path/to/hbase-1.1.2 \
./src/create_table.sh

  1. Run OpenTSDB

export HBASE_CONF=/path/to/hbase-1.1.2/conf
mkdir -p <tmp_dir>
./build/tsdb tsd --port=4242 --staticroot=build/staticroot \
--cachedir=<tmp_dir>

Future work

By all means our work on asyncbigtable does not stop here. We are putting great effort towards improving the library to achieve the high quality standards of the rest of OpenTSDB code. Our first priority is to test the library against most real world scenarios and achieve the highest quality. In the future, we plan to benchmark the performance of OpenTSDB with Bigtable and compare how it competes against HBase.

We are also working on building a true asynchronous implementation of the asyncbigtable library by integrating deeper with the Google Bigtable API.

Acknowledgements

We would like to thank the OpenTSDB developers (Benoît Sigoure and Chris Larsen) for their brilliant work in building such great software and for embracing the asyncbigtable library. Their insights and code contributions helped us deal with some serious issues. Also, we would like to thank the Google Cloud Bigtable team because they expressed genuine interest in this project and they were very generous in providing us with cloud infrastructure and excellent support.

email

Interested in working with Christos? Schedule a tech call.

Principal Big Data Architect

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *