From 0 to Cassandra – An Exhaustive Approach to Installing Cassandra

Posted in: Cassandra, Technical Track

All around the Internet you can find lots of guides on how to install Cassandra on almost every Linux distro around. But normally all of this information is based on the packaged versions and omit some parts that are deemed essential for proper Cassandra functioning.

Note: If you are adding a machine to an existing Cluster please approach this guide with caution and replace the configurations here recommended by the ones you already have on your cluster, specially the Cassandra configuration.

Without further conversation lets start!

Essentials

Start your machine and install the following:

  • ntp (Packages are normally ntp, ntpdata and ntp-doc)
  • wget (Unless you have your packages copied over via other means)
  • vim (Or your favorite text editor)

Retrieve the following packages

Installation

Set up NTP

This can be more or less dependent of your system, but the following commands should do it (You can check this guide also):

~$ chkconfig ntpd on
~$ ntpdate pool.ntp.org
~$ service ntpd start

Set up Java (Let’s assume we are doing this in /opt)

Extract Java and install it:

~$ tar xzf [java_file].tar.gz
~$ update-alternatives --install /usr/bin/java java /opt/java/bin/java 1 

Check that is installed:

~$ java -version
java version '1.7.0_75'
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

Let’s put JNA into place

~$ mv jna-VERSION.jar /opt/java/lib

Set up Cassandra (Let’s assume we are doing this in /opt)

 

Extract Cassandra:

~$ tar xzf [cassandra_file].tar.gz

Create Cassandra Directories:

~$ mkdir /var/lib/cassandra
~$ mkdir /var/lib/cassandra/commitlog
~$ mkdir /var/lib/cassandra/data
~$ mkdir /var/lib/cassandra/saved_caches
~$ mkdir /var/log/cassandra

Configuration

 Linux configuration

~$ vim /etc/security/limits.conf

Add the following:

root soft memlock unlimited
root hard memlock unlimited
root – nofile 100000
root – nproc 32768
root – as unlimited

CentOS, RHEL, OEL, set in the following in /etc/security/limits.d/90-nproc.conf:

* – nproc 32768

Add the following to the sysctl file:

~$ vim /etc/sysctl.conf
vm.max_map_count = 131072

Finally (Reboot also works):

~$ sysctl -p

Firewall, the following ports must be open:

# Internode Ports
7000    Cassandra inter-node cluster communication.
7001    Cassandra SSL inter-node cluster communication.
7199    Cassandra JMX monitoring port.
# Client Ports
9042    Cassandra client port (Native).
9160    Cassandra client port (Thrift).

Note: Some/Most guides tell you to disable swap, I think of swap as an acrobat’s safety net, it should never have to be put to use, but in need it exists. As such, I never disable it and I put a low swappiness (around 10). You can read more about it here and here.

 Cassandra configuration

Note: Cassandra has a LOT of settings, these are the ones you should always set if you are going live. Lots of them depend on hardware and/or workload. Maybe I’ll write a post about them in the near future. In the meantime, you can read about them here.

~$ vim /opt/cassandra/conf/cassandra.yaml

Edit the following fields:

cluster_name: <Whatever you would like to call it>
data_file_directories: /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog

saved_caches_directory: /var/lib/cassandra/saved_caches

# Assuming this is your first node, this should be reachable by other nodes
seeds: “<IP>”

# This is where you listen for intra node communication
listen_address: <IP>

# This is where you listen to incoming client connections
rpc_address: <IP>

endpoint_snitch: GossipingPropertyFileSnitch

Edit the snitch property file:

~$ vim  /opt/cassandra/conf/cassandra-rackdc.properties:

Add the DC and the RACK the server is in. Ex:

dc=DC1
rack=RAC1

Finally make sure your logs go to /var/log/cassandra:

~$ vim /opt/cassandra/conf/logback.xml

Testing

Start Cassandra

~$ service cassandra start

You should see no error here, wait a bit then:

~$ grep  JNA /var/log/cassandra/system.log
INFO  HH:MM:SS JNA mlockall successful

Then check the status of the ring:

~$ nodetool status
Datacenter: DC1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Owns   Host ID                               Token                                    Rack
UN  185.10.49.136  140.59 KB  100.0%  5c3c697f-8bfd-4fb2-a081-7af1358b313f  0                                        RAC

Creating a keyspace a table and inserting some data:

~$ cqlsh xxx.yy.zz.ww

cqlsh- CREATE KEYSPACE sandbox WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', DC1 : 1};
Should give no errors
cqlsh- USE sandbox;
cqlsh:sandbox- CREATE TABLE data (id uuid, data text, PRIMARY KEY (id));
cqlsh:sandbox- INSERT INTO data (id, data) values (c37d661d-7e61-49ea-96a5-68c34e83db3a, 'testing');
cqlsh:sandbox- SELECT * FROM data;

id                                   | data
--------------------------------------+---------
c37d661d-7e61-49ea-96a5-68c34e83db3a | testing

(1 rows)

And we are done, you can start using your Cassandra node!

Learn more about Pythian’s Cassandra Services.

For Cassandra help, contact us now!

email

Interested in working with Carlos? Schedule a tech call.

About the Author

Carlos Rolo is a Datastax Certified Cassandra Architect, and has deep expertise with distributed architecture technologies. Carlos is driven by challenge, and enjoys the opportunities to discover new things and new ways of learning that come with working at Pythian. He has become known and trusted by customers and colleagues for his ability to understand complex problems, and to work well under pressure. He prides himself on being a tenacious problem solver, while remaining a calm and positive presence on any team. When Carlos isn’t working he can be found playing water polo or enjoying the his local community. Carlos holds a Bachelor of Electro-technical Engineering, and a Master of Control Systems and Automation.

2 Comments. Leave new

Great post, Carlos.

Reply

Thank you for the post. I had faced a lot of issues while installing Apache version of cassandra.This post was relly helpful

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *