How to build your very own Cassandra 4.0 release

Posted in: Cassandra, Technical Track

Over the last few months, I have been seeing references to Cassandra 4.0 and some of its new features. When that happens with a technology I am interested in, I go looking for the preview releases to download and test. Unfortunately, so far, there are no such releases. But, I am still interested, so I’ve found it necessary to build my own Cassandra 4.0 release. This is in my humble opinion not the most desirable way to do things since there is no Cassandra 4.0 branch yet. Instead, the 4.0 code is on the trunk. So if you do two builds a commit or two apart, and there are typically at least three or four commits a week right now, you get a slightly different build. It is, in essence, a moving target.

All that said and done, I decided if I could do it, then the least I could do is write about how to do it and let everyone who wants to try it learn how to avoid a couple of dumb things I did when I first tried it.

Building your very own Cassandra 4.0 release is actually pretty easy. It consists of five steps:

  1. Make sure you have your prerequisites
    1. Java SDK 1.8 or Java 1.11 Open Source or Oracle
    2. Ant 1.8
    3. Git CLI client
    4. Python >=2.7<3.0
  2. Download the GIT repository
    1. git clone https://gitbox.apache.org/repos/asf/cassandra.git
  3. Build your new Cassandra release
    1. Cd cassandra
    2. Ant
  4. Run Cassandra
    1. Cd ./bin
    2. ./cassandra
  5. Have fun
    1. ./nodetool status
    2. ./cqlsh

I will discuss each step in a little bit more detail:

Step 1) Verify, and if necessary, install your prerequisites

For Java, you can confirm the JDK presence by typing in:


[email protected]:~$javac -version
javac 1.8.0_191

For ant:


[email protected]:~$ ant -version
Apache Ant(TM) version 1.9.6 compiled on July 20 2018

For git:


[email protected]:~$ git --version
git version 2.7.4

For Python:


[email protected]:~$ python --version
Python 2.7.12

If you have all of the right versions, you are ready for the next step. If not, you will need to install the required software which I am not going to go into here.

Step 2) Clone the repository

Verify you do not already have an older copy of the repository:


[email protected]:~$ ls -l cassandra
ls: cannot access 'cassandra': No such file or directory

If you found a Cassandra directory, you will want to delete or move it or your current directory elsewhere. Otherwise:


[email protected]:~$ git clone https://git-wip-us.apache.org/repos/asf/cassandra.git
Cloning into 'cassandra'...
remote: Counting objects: 316165, done.
remote: Compressing objects: 100% (51450/51450), done.
remote: Total 316165 (delta 192838), reused 311524 (delta 189005)
Receiving objects: 100% (316165/316165), 157.78 MiB | 2.72 MiB/s, done.
Resolving deltas: 100% (192838/192838), done.
Checking connectivity... done.
Checking out files: 100% (3576/3576), done.

[email protected]:~$ du -sh *
294M cassandra

At this point, you have used up 294 MB on your host and you have an honest-for-real git repo clone on your host – in my case, a Lenovo laptop running Windows 10 Linux subsystem.

And your repository looks something like this:


  [email protected]:~$ ls -l cassandra
  total 668
  drwxrwxrwx 1 john john    512 Feb  6 15:54 bin
  -rw-rw-rw- 1 john john    260 Feb  6 15:54 build.properties.default
  -rw-rw-rw- 1 john john 101433 Feb  6 15:54 build.xml
  -rw-rw-rw- 1 john john   4832 Feb  6 15:54 CASSANDRA-14092.txt
  -rw-rw-rw- 1 john john 390460 Feb  6 15:54 CHANGES.txt
  drwxrwxrwx 1 john john    512 Feb  6 15:54 conf
  -rw-rw-rw- 1 john john   1169 Feb  6 15:54 CONTRIBUTING.md
  drwxrwxrwx 1 john john    512 Feb  6 15:54 debian
  drwxrwxrwx 1 john john    512 Feb  6 15:54 doc
  -rw-rw-rw- 1 john john   5895 Feb  6 15:54 eclipse_compiler.properties
  drwxrwxrwx 1 john john    512 Feb  6 15:54 examples
  drwxrwxrwx 1 john john    512 Feb  6 15:54 ide
  drwxrwxrwx 1 john john    512 Feb  6 15:54 lib
  -rw-rw-rw- 1 john john  11609 Feb  6 15:54 LICENSE.txt
  -rw-rw-rw- 1 john john 123614 Feb  6 15:54 NEWS.txt
  -rw-rw-rw- 1 john john   2600 Feb  6 15:54 NOTICE.txt
  drwxrwxrwx 1 john john    512 Feb  6 15:54 pylib
  -rw-rw-rw- 1 john john   3723 Feb  6 15:54 README.asc
  drwxrwxrwx 1 john john    512 Feb  6 15:54 redhat
  drwxrwxrwx 1 john john    512 Feb  6 15:54 src
  drwxrwxrwx 1 john john    512 Feb  6 15:54 test
  -rw-rw-rw- 1 john john  17215 Feb  6 15:54 TESTING.md
  drwxrwxrwx 1 john john    512 Feb  6 15:54 tools

Step 3) Build your new Cassandra 4.0 release

Remember what I said in the beginning? There is no branch for Cassandra 4.0 at this point, so building from the trunk is quite simple:


[email protected]:~$ cd cassandra
[email protected]:~/cassandra$ ant
Buildfile: /home/john/cassandra/build.xml

BUILD SUCCESSFUL
Total time: 1 minute 4 seconds

That went quickly enough. Let’s take a look and see how much larger the directory has gotten:


[email protected]:~$ du -sh *
375M cassandra

Our directory grew by 81MB pretty much all in the new build directory which now has 145 new files including ./build/apache-cassandra-4.0-SNAPSHOT.jar. I am liking that version 4.0 right in the middle of the filename.

Step 4) Start Cassandra up. This one is easy if you do the sensible thing


[email protected]:~/cassandra$ cd ..
[email protected]:~$ cd cassandra/bin
[email protected]:~/cassandra/bin$ ./cassandra
[email protected]:~/cassandra/bin$ CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.deserializeLargeSubset (Lorg/apache/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/Columns;I)Lorg/apache/cassandra/db/Columns;
CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.serializeLargeSubset (Ljava/util/Collection;ILorg/apache/cassandra/db/Columns;ILorg/apache/cassandra/io/util/DataOutputPlus;)V
CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.serializeLargeSubsetSize (Ljava/util/Collection;ILorg/apache/cassandra/db/Columns;I)I

INFO [MigrationStage:1] 2019-02-06 21:26:26,222 ColumnFamilyStore.java:407 - Initializing system_auth.role_members
INFO [MigrationStage:1] 2019-02-06 21:26:26,234 ColumnFamilyStore.java:407 - Initializing system_auth.role_permissions
INFO [MigrationStage:1] 2019-02-06 21:26:26,244 ColumnFamilyStore.java:407 - Initializing system_auth.roles

We seem to be up and running. Its time to try some things out:

Step 5) Have fun

We will start out making sure we are up and running by using nodetool to connect and display a cluster status. Then we will go into the CQL shell to see something new. It is important to note that since you are likely to have nodetool and cqlsh already installed on your host, you need to use the ./ in front of your commands to ensure you are using the 4.0 version. I have learned the hard way that forgetting the ./ can result in some very real confusion.


  [email protected]:~/cassandra/bin$ ./nodetool status
  Datacenter: datacenter1
  =======================
  Status=Up/Down
  |/ State=Normal/Leaving/Joining/Moving
  --  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
  UN  127.0.0.1  115.11 KiB  256          100.0%            f875525b-3b78-49b4-a9e1-2ab0cf46b881  rack1
            
  [email protected]:~/cassandra/bin$ ./cqlsh
  Connected to Test Cluster at 127.0.0.1:9042.
  [cqlsh 5.0.1 | Cassandra 4.0-SNAPSHOT | CQL spec 3.4.5 | Native protocol v4]
  Use HELP for help.
  cqlsh> desc keyspaces;

  system_traces  system_auth  system_distributed     system_views
  system_schema  system       system_virtual_schema

  cqlsh>

We got a nice cluster with one node and we see the usual built-in key spaces. Well um… not exactly. We see two new key spaces system_virtual_schema and system_views. Those look very interesting.

In my next blog, I’ll be talking more about Cassandra’s new virtual table facility and how very useful it is going to be someday soon. I hope.

email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Pricipal Consultant
John has 40 of years experience working with data. Data in files and in Databases from flat files through ISAM to relational databases and, most recently, NoSQL. For the last 15 year he's worked on a variety of Open source technologies including MySQL, PostgreSQL, Cassandra, Riak, Hadoop, and Hbase. As a Chief Database Architect at AOL he brought MySQL in to replace Sybase and has worked hands on with MySQL databases holding hundreds of billions of rows and running millions of transactions per second. For the last three years he has been working for Pythian to help their customers improve their existing databases and select new ones for new applications.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *