NoSQL: Apache Cassandra 101

apache_cassandra_logoCassandra is a complex, highly scalable NoSQL database system that was initially developed by Facebook and is now under the purview of the Apache Foundation.  Apache Casandra has risen as a welcome alternative to HBase, another highly scalable, and available key value store.

Cassandra is different from most Relational Databases in that it stores data by column instead of by row.  This makes aggregating column values super fast.  It also makes dealing with the database somewhat different as well.  Instead of creating tables, you create column families.  And I shouldn’t have to mention this, but Cassandra is a NoSql solution, so don’t expect any “joins”!

Cassandra is able to scale read/write throughput linearly with machine count based on its unique architecture.  Each Cassandra node does exactly the same thing.  Cassandra nodes together are called rings.   To make all this scaling happen seamlessly, Cassandra nodes employ a gossip protocol which enables the nodes to talk to each other and figure out where to send reads and writes in the ring.

Cassandra like many highly scalable database systems uses data partitioning to achieve some of its incredible speed characteristics.  Cassandra’s default behavior is to partition your data randomly across your ring.  Incredibly, even with your data partitioned, you can still add and remove new nodes from your Cassandra ring without compromising availability!

So with a multitude of nodes in your Cassandra ring how do you monitor this massive system?  Do you even need to monitor it?  Can it be monitored?  Yes, yes, and yes.

Cassandra comes with a very nifty command line tool called nodetool. Nodetool can help you monitor your ring extremely well by plugging into JMX, the Java Management Extensions.  This enables you to gather very detailed information about your ring.  You can get details about individual column families or about the entire ring itself!

One very cool thing about Cassandra, is that you only have to ask one node for information about the entire ring.  For example you can try:

bin/nodetool -host ring
Address       Status  Load        Range                                    Ring  Up      459.27 MB   75603446264197340449435394672681112420   |<--|  Up      382.53 MB   137462771597874153173150284137310597304  |   |  Up      511.34 MB   63538518574533451921556363897953848387   |-->|

And suddenly you have the status of all of the nodes in your ring!  That is a good first step to understanding the monitoring and performance of your Cassandra rings.  In our next article, we will look at how a good monitoring service like Monitis can leverage the handy “nodetool” to help you keep an eye on your rings.

See also:  Picking the Right NoSQL Database Tool and  NoSQL Databases – A Look at Apache Cassandra

You might also like