At Palomino, we’ve been hard at work building the Palomino Cluster Tool. Its goal is to let you build realistically-sized and functionally-configured distributed databases in a matter of hours instead of days or weeks as it is at present. Today marks another milestone toward that goal as we release our Chef Cookbook for building HBase on CentOS!
Riot Games was kind enough to open source their Chef Cookbook for building a Hadoop cluster. Although the code wasn’t in a state that would produce a functional cluster, and it was almost entirely undocumented, it was a great start.
Recently I was tasked with building an HBase cluster on CentOS using Chef. Although I’ve written a Cookbook (three times!) to do so, my code was never fully optimized. It could build a cluster, but only with hard-coded configuration parameters, or it produced a cluster that was running in a non-realistic non-production configuration.
Using the Riot Games Cookbook and the lessons I’d learned in the past, I whipped it into shape. I not only modified it to produce a functional cluster in a non-Riot environment, but also to build HBase on top of that! There are over 800 changes in the diff and documentation on how to use it.
Here you can find the newest Chef Cookbook for HBase on CentOS. Here you can find the original Ansible Playbooks for HBase on Ubuntu. If you would like to use this code to build your own cluster, you are encouraged to join the mailing list to get help and advice from your peers.
Notes A distributed database can be tested functionally by installing on a single machine, but when it comes time to run benchmarks, or to discover the other 90% of functionality that only appears in a distributed setup, you will want to have the database installed on many machines, preferably dozens.  Many projects seem to stop short of installing the database in a way that would let you benchmark it. Perhaps there are shortcuts taken like putting all database files into /tmp, or disabling logging, or removing tricky/subtle components in the interest of simplicity. The Palomino Cluster Tool provides you with a cluster that’s actually ready for production. Sure, you still have to edit the configurations a little, but a good base generic configuration is provided.