All around the Internet you can find lots of guides on how to install Cassandra on almost every Linux distro around. But normally all of this information is based on the packaged versions and omit some parts that are deemed essential for proper Cassandra functioning.
Note: If you are adding a machine to an existing Cluster please approach this guide with caution and replace the configurations here recommended by the ones you already have on your cluster, specially the Cassandra configuration.
Without further conversation lets start!
Essentials
Start your machine and install the following:
- ntp (Packages are normally ntp, ntpdata and ntp-doc)
- wget (Unless you have your packages copied over via other means)
- vim (Or your favorite text editor)
Retrieve the following packages
- Java 7
- Apache Cassandra (Latest Stable)
- Java JNA
Installation
Set up NTP
This can be more or less dependent of your system, but the following commands should do it (You can check this guide also):
~$ chkconfig ntpd on ~$ ntpdate pool.ntp.org ~$ service ntpd start
Set up Java (Let’s assume we are doing this in /opt)
Extract Java and install it:
~$ tar xzf [java_file].tar.gz ~$ update-alternatives --install /usr/bin/java java /opt/java/bin/java 1
Check that is installed:
~$ java -version java version '1.7.0_75' Java(TM) SE Runtime Environment (build 1.7.0_75-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
Let’s put JNA into place
~$ mv jna-VERSION.jar /opt/java/lib
Set up Cassandra (Let’s assume we are doing this in /opt)
Extract Cassandra:
~$ tar xzf [cassandra_file].tar.gz
Create Cassandra Directories:
~$ mkdir /var/lib/cassandra ~$ mkdir /var/lib/cassandra/commitlog ~$ mkdir /var/lib/cassandra/data ~$ mkdir /var/lib/cassandra/saved_caches ~$ mkdir /var/log/cassandra
Configuration
Linux configuration
~$ vim /etc/security/limits.conf
Add the following:
root soft memlock unlimited
root hard memlock unlimited
root – nofile 100000
root – nproc 32768
root – as unlimited
CentOS, RHEL, OEL, set in the following in /etc/security/limits.d/90-nproc.conf:
* – nproc 32768
Add the following to the sysctl file:
~$ vim /etc/sysctl.conf vm.max_map_count = 131072
Finally (Reboot also works):
~$ sysctl -p
Firewall, the following ports must be open:
# Internode Ports
7000 Cassandra inter-node cluster communication.
7001 Cassandra SSL inter-node cluster communication.
7199 Cassandra JMX monitoring port.
# Client Ports
9042 Cassandra client port (Native).
9160 Cassandra client port (Thrift).
Note: Some/Most guides tell you to disable swap, I think of swap as an acrobat’s safety net, it should never have to be put to use, but in need it exists. As such, I never disable it and I put a low swappiness (around 10). You can read more about it here and here.
Cassandra configuration
Note: Cassandra has a LOT of settings, these are the ones you should always set if you are going live. Lots of them depend on hardware and/or workload. Maybe I’ll write a post about them in the near future. In the meantime, you can read about them here.
~$ vim /opt/cassandra/conf/cassandra.yaml
Edit the following fields:
cluster_name: <Whatever you would like to call it>
data_file_directories: /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlogsaved_caches_directory: /var/lib/cassandra/saved_caches
# Assuming this is your first node, this should be reachable by other nodes
seeds: “<IP>”# This is where you listen for intra node communication
listen_address: <IP># This is where you listen to incoming client connections
rpc_address: <IP>endpoint_snitch: GossipingPropertyFileSnitch
Edit the snitch property file:
~$ vim /opt/cassandra/conf/cassandra-rackdc.properties:
Add the DC and the RACK the server is in. Ex:
dc=DC1
rack=RAC1
Finally make sure your logs go to /var/log/cassandra:
~$ vim /opt/cassandra/conf/logback.xml
Testing
Start Cassandra
~$ service cassandra start
You should see no error here, wait a bit then:
~$ grep JNA /var/log/cassandra/system.log INFO HH:MM:SS JNA mlockall successful
Then check the status of the ring:
~$ nodetool status Datacenter: DC1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 185.10.49.136 140.59 KB 100.0% 5c3c697f-8bfd-4fb2-a081-7af1358b313f 0 RAC
Creating a keyspace a table and inserting some data:
~$ cqlsh xxx.yy.zz.ww cqlsh- CREATE KEYSPACE sandbox WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', DC1 : 1}; Should give no errors cqlsh- USE sandbox; cqlsh:sandbox- CREATE TABLE data (id uuid, data text, PRIMARY KEY (id)); cqlsh:sandbox- INSERT INTO data (id, data) values (c37d661d-7e61-49ea-96a5-68c34e83db3a, 'testing'); cqlsh:sandbox- SELECT * FROM data; id | data --------------------------------------+--------- c37d661d-7e61-49ea-96a5-68c34e83db3a | testing (1 rows)
And we are done, you can start using your Cassandra node!
Learn more about Pythian’s Cassandra Services.
For Cassandra help, contact us now!
2 Comments. Leave new
Great post, Carlos.
Thank you for the post. I had faced a lot of issues while installing Apache version of cassandra.This post was relly helpful