Perhaps you’ve caught our series of blogs about NoSQL database storage tools?! Monitis has begun providing guidance on picking the right tool to match your company’s IT computing needs. In our previous blog post on NoSQL, we offered a comprehensive overview of Apache Cassandra – one of the many (currently there’s more than 100 popular solutions out there) NoSQL tools available.
Today, we’ll take a look at Apache HBase – originally created for use with Apache’s Hadoop, a software framework that supports data-intensive distributed applications under a free license.
Our mission in these posts is simply to help you choose the best NoSQL DBSs – most of which are open-source and cost-free. After all, you want to make sure that your data is being stored safely. Aren’t there enough worries out there about data security – whether the data is being stored on the cloud or behind your internal, private firewall?
HBase is really a clone (or a very close relative) of Google’s Bigtable, and, like I said, it was originally created for use with Hadoop. Actually, HBase is a subproject of the Apache Hadoop project.
HBase offers database capabilities for Hadoop, which means you can use it as a source or sink for MapReduce jobs. HBase is a column-oriented database, and it is built to provide low latency requests on top of Hadoop HDFS. Unlike some other columnar databases that provide eventual consistency, HBase is very consistent.
An HBase cluster uses several kinds of servers. For one, HDFS needs at least one namenode and several datanodes. Plus, HBase needs a ZooKeeper cluster, a master and several region servers. Requests must be made to the master(s).
On the HDFS level, existing data are not sharded automatically. However, new data is sharded. On the HBase level, data is divided into regions that are sharded automatically across region servers.
If you choose, HBase will allow you to use Google’s Protobuf (Protocol Buffer) API as an alternative to XML. Protobuf is a very efficient way of serializing data. It has the advantage in compacting – the same data two to three times smaller than XML, and of being 20 to 100 times faster to parse than XML. Why? Because of the way the protocol buffer encodes bytes on the wire. So, what this means is that working with HBase can be very fast. Protobuf is used extensively within Google; they incorporate nearly 50,000 different message types into Protobuf across a wide variety of systems.
The database comes with a web console user interface to monitor and manage region servers and master servers.
Interestingly, Facebook recently declined to use Cassandra and has adopted an Hbase because this new Messaging model requires more flexible replication – as HBase does.
If you’re a sysadmin, we hope this information about Apache HBase comes in handy. And we hope you’ll take heed that – no matter how hardy your NoSQL database is – it needs monitoring! That’s why Monitis offers 24/7 independent remote monitoring of servers– all done via the cloud (so your firewalls don’t get in the way).
In fact, if you’re just dipping your big toe in the cloud, you can try our free server monitoring – the first such service in the industry!
Stay tuned for our next post on NoSQLs – this time a review of three other brands: