Posted by & filed under databases, Linux, Software, Software Development.

  • DZone
  • Reddit
  • HackerNews
  • Twitter
  • Facebook
  • Google Plus
  • Pinterest
  • StumbleUpon
  • LinkedIn
  • Tumblr
  • BlinkList
  • Mister Wong
  • Add to favorites
  • Email

Introduction to Cassandra

Even Cassandra is often mentioned in books and in several NoSQL blogs, there is still not enough attention to this very interesting NoSQL database. Cassandra is extremely scalable and can deliver continuous availability. Cassandra is very good for managing large amounts of data in a cluster that span across multiple data centers and the cloud. Best of all cassandra is also linear scalable. Ok, you say, what about maintenance? Apache Cassandra promisses operational simplicity, zero conf and self-balancing architecture. It also promises some degree of hardware agnosticism and can run on commodity servers and even on any kind of consumer hardware. As i  mentioned before Cassandra is very good in scale and has no single point of failure (not by design, but you can get one if you treat it wrong ;) ). So why there is still no so much attention to this database, even if access times outperforms Mongo DB (especially writes)?
I don’t know exactly, but i think it has to do with a relatively steep learning curve in comparison to some competitors in the field.
However this article shows that one can get pretty fast familiar with Cassandra, starting using it with one local node.

Installation of Cassandra

I tried it on current xubuntu (13.04.2) with java version “1.7.0_25″ OpenJDK 64-Bit. This tutorial explains how to install Apache Cassandra as debian package.

Step 1;  add official Cassandra packet sources:

#become root rights
sudo bash
#and create new source lists
cat > /etc/apt/sources.list.d/cassandra.sources.list << DELIM
deb http://www.apache.org/dist/cassandra/debian 20x main
deb-src http://www.apache.org/dist/cassandra/debian 20x main
DELIM

Step 2: update package information and install cassandra

apt-get update
apt-get install cassandra

You may be warned that ‘cassandra’ is untrusted package. When you hit “Y” cassandra will be installed and started for the first time showing you initial start parameters. Pretty easy so far, isn’t? I like this zerro-conf, that is the way scalable software should be. If you look on default configuration you will find out that is follows linux stadards.
Cassandra’s start script even tries to guess good JVM parameters considering your hardware, very nice feature. This selfconfiguraton is very usefull on large clustered ervironments.

Cassandra filse will be installed in the following directories:

  • /var/lib/cassandra (data )
  • /var/log/cassandra (logs)
  • /var/run/cassandra (runtime files)
  • /usr/share/cassandra (environment settings)
  • /usr/share/cassandra/lib (JAR files)
  • /usr/bin (binaries)
  • /usr/sbin (binaries)
  • /etc/cassandra (configuration files)
  • /etc/init.d (service startup script)
  • /etc/security/limits.d (cassandra user limits)
  • /etc/default (additional startup config)

First Installation Tests

This installation has created  init.d script at /etc/init.d/. You can use this script to control Cassandra database.

Test Start

Let’s start Cassandra

$sudo service cassandra start

On some localized envirnoments you could get error like:

expr: Syntaxfehler
expr: Syntaxfehler
/etc/init.d/cassandra: 59: [: Illegal number:
/etc/init.d/cassandra: 63: [: Illegal number:
/etc/init.d/cassandra: 67: [: Illegal number:
expr: Syntaxfehler
/etc/init.d/cassandra: 81: [: Illegal number:
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XmsM -XmxM -XmnM -XX:+HeapDumpOnOutOfMemoryError -Xss256k

In that case you should uncomment following lines:

#MAX_HEAP_SIZE="4G"
#HEAP_NEWSIZE="800M"

in /etc/cassandra/cassandra-env.sh. Set MAX_HEAP_SIZE not greater tha a half your Hardware RAM. It's is not usefull. Cassandra uses Off-Heap-Storage.

Another problems where reported by using Cassandra with OpenJDK, so if you like install oracle jdk and use it.

Stop Cassandra

To stop Cassandra use:

$pkill -f CassandraDaemon # works

Don't trust "stop" and "status" argument to much. "stop" is not working on my installation, and "status" gives me something with "Cassandra is not running" - but it does.

$sudo service cassandra status # prints following:
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1463M -Xmx1463M -Xmn365M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
 * Cassandra is not running

So don't trust,  just call cqlsh

$cqlsh# when Cassandra is not started:
Connection error: Could not connect to localhost:9160

otherwise is should be something like this:

$cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 4.0.1 | Cassandra 2.0.1 | CQL spec 3.1.1 | Thrift protocol 19.37.0]
Use HELP for help.
cqlsh> show host # type 'show host' and hit enter
Connected to Test Cluster at localhost:9160.
cqlsh>

Another way to test you node  is to use a nodetool

$nodetool -h 127.0.0.1 info

Furthermore it might be also interesting to watch Cassandra logs while performing first steps.

$tail -f /var/log/cassandra/system.log # good for additional bash session

Going Deeper into configuration?

There is no need to for the most development setups.
However if you need to do some configuration, look into these files, first:

  • /etc/cassandra/cassandra-env.sh – holds JVM environment information.
  • /etc/cassandra/cassandra.yaml – main configuration file
  • /etc/cassandra/log4j-server.properties – Log4J properties of the node.

More Information

It might be interesting not to use .deb package, but downloading Cassandra binaries and installing it by hands. I found very good post about it.

Additionally look at following:

Thank you for reading. Share you experience with Cassandra.