Introduction to Cassandra
Cassandra is often mentioned in books and in several NoSQL blogs, but there is still not enough attention to this very interesting NoSQL database. Cassandra is extremely scalable and can deliver continuous availability. Cassandra is very good for managing large amounts of data in a cluster that spans multiple data centers and the cloud(s). Did I mention scalability? Cassandra DB scales nearly linear! Ok, you say, what about maintenance? Apache Cassandra promises operational simplicity, zero conf, and self-balancing architecture. It also promises some degree of hardware agnosticism and can run on commodity servers and even on any kind of consumer hardware. And of course, there is no single point of failure by design (but you can get one if you treat it wrong ;) ). So why there is still no so much attention to this incredible database, even if access times outperforms Mongo DB (especially writes)? I don’t know exactly, but I think it has to do with a relatively steep learning curve in comparison to some competitors in the field.
In this article, I try to show that one can get pretty fast familiar with Cassandra.
Installation of Cassandra
Step 1: add official Cassandra packet sources:
# create cassandra.sources.list vi /etc/apt/sources.list.d/cassandra.sources.list #insert following lines deb http://www.apache.org/dist/cassandra/debian 20x main deb-src http://www.apache.org/dist/cassandra/debian 20x main
Step 2: update package information and install cassandra
apt-get update apt-get install cassandra
You may be warned that ‘cassandra’ is an untrusted package. When you hit “Y” cassandra will be installed and started for the first time showing you initial start parameters. Pretty easy so far, isn’t it? I like this zero-conf, that is the way scalable software should be. If you look at the default configuration you will find out that it follows Linux standards. Cassandra’s start script even tries to guess good JVM parameters considering your hardware, a very nice feature. This self-configuration is very useful on largely clustered environments.
Cassandra files will be installed in the following directories:
/etc/init.d(service startup script)
/etc/security/limits.d(cassandra user limits)
/etc/default(additional startup config)
First Installation Tests
This installation has created init.d script at /etc/init.d/. You can use this script to control the Cassandra database.
Let’s start Cassandra
sudo service cassandra start
On some localized environments you could get an error like:
expr: Syntaxfehler expr: Syntaxfehler /etc/init.d/cassandra: 59: [: Illegal number: /etc/init.d/cassandra: 63: [: Illegal number: /etc/init.d/cassandra: 67: [: Illegal number: expr: Syntaxfehler /etc/init.d/cassandra: 81: [: Illegal number: xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XmsM -XmxM -XmnM -XX:+HeapDumpOnOutOfMemoryError -Xss256k
In that case, you should uncomment the following lines:
in /etc/cassandra/cassandra-env.sh. Set MAX_HEAP_SIZE not greater that a half your Hardware RAM. It’s is not useful. Cassandra uses Off-Heap-Storage.
Another problem were reported by using Cassandra with OpenJDK, so if you like install oracle jdk and use it.
To stop Cassandra use:
pkill -f CassandraDaemon # works
Don’t trust “stop” and “status” argument too much. “stop” is not working on my installation, and “status” gives me something with “Cassandra is not running” - but it does.
sudo service cassandra status # prints following: xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1463M -Xmx1463M -Xmn365M -XX:+HeapDumpOnOutOfMemoryError -Xss256k * Cassandra is not running
So don’t trust, just call cqlsh
cqlsh# when Cassandra is not started: Connection error: Could not connect to localhost:9160
otherwise it should be something like this:
cqlsh Connected to Test Cluster at localhost:9160. [cqlsh 4.0.1 | Cassandra 2.0.1 | CQL spec 3.1.1 | Thrift protocol 19.37.0] Use HELP for help. cqlsh&&> show host # type 'show host' and hit enter Connected to Test Cluster at localhost:9160. cqlsh&&>;
Another way to test your node is to use a
nodetool -h 127.0.0.1 info
Furthermore, it might be also interesting to watch Cassandra logs while performing the first steps.
$tail -f /var/log/cassandra/system.log # good for additional bash session
Going Deeper into configuration?
There is no need to for the most development setups. However if you need to do some configuration, look into these files, first:
- /etc/cassandra/cassandra-env.sh - holds JVM environment information.
- /etc/cassandra/cassandra.yaml - main configuration file
- /etc/cassandra/log4j-server.properties - Log4J properties of the node.
Thank you for reading. Share your experience with Cassandra.