Perhaps you’ve caught our series of blogs about NoSQL database storage tools?! Monitis has begun providing guidance on picking the right tool to match your company’s IT computing needs. In our previous blog post on NoSQL, we offered a comprehensive overview of Apache Cassandra – one of the many (currently there’s more than 100 popular solutions out there) NoSQL tools available.
Today, we’ll take a look at Apache HBase – originally created for use with Apache’s Hadoop, a software framework that supports data-intensive distributed applications under a free license.
Our mission in these posts is simply to help you choose the best NoSQL DBSs – most of which are open-source and cost-free. After all, you want to make sure that your data is being stored safely. Aren’t there enough worries out there about data security – whether the data is being stored on the cloud or behind your internal, private firewall?
They Lay of the Land: Apache HBase
So, here is what you get from Apache HBase!
HBase is really a clone (or a very close relative) of Google’s Bigtable, and, like I said, it was originally created for use with Hadoop. Actually, HBase is a subproject of the Apache Hadoop project.
HBase offers database capabilities for Hadoop, which means you can use it as a source or sink for MapReduce jobs. HBase is a column-oriented database, and it is built to provide low latency requests on top of Hadoop HDFS. Unlike some other columnar databases that provide eventual consistency, HBase is very consistent.
An HBase cluster uses several kinds of servers. For one, HDFS needs at least one namenode and several datanodes. Plus, HBase needs a ZooKeeper cluster, a master and several region servers. Requests must be made to the master(s).
On the HDFS level, existing data are not sharded automatically. However, new data is sharded. On the HBase level, data is divided into regions that are sharded automatically across region servers.
- Orientation: Columnar
- Created: HBase was created at Powerset in 2007 and later donated to Apache.
- Implementation language: Java
- Distributed: Affirmative. You can run HBase in standalone, pseudo distributed (several instances of HBase are all running on the same host), or fully distributed mode.
- Storage: HBase provides Bigtable-like capabilities on top of the Hadoop File System.
- Schema: HBase supports unstructured and partially structured data. To do so, data is organized into column families (a term we addressed in our last post about Apache Cassandra). You address an individual record, called a “cell” in HBase, with a combination of row key, column family, cell qualifier, and time stamp. As opposed to RDBMS (relational database management systems), in which you must define your table well in advance, with HBase you can simply name a column family and then allow the cell qualifiers to be determined at runtime. This lets you be very flexible and supports an agile approach to development.
- Client: You can interact with HBase via Thrift, a RESTful service gateway, Protobuf (see “Additional Features” below), or an extensible JRuby shell.
- Open source: Affirmative (Apache License)
- Production use: HBase has been used at Adobe since 2008. It is also used at Twitter, Mahalo, StumbleUpon, Ning, Hulu, World Lingo, Indonesia-based Detikcom and at Yahoo!.
- Additional features: Because HBase is part of the Hadoop project, it features tight integration with Hadoop. There is a set of convenience classes that allow you to easily execute MapReduce jobs using HBase as the backing data store.
If you choose, HBase will allow you to use Google’s Protobuf (Protocol Buffer) API as an alternative to XML. Protobuf is a very efficient way of serializing data. It has the advantage in compacting – the same data two to three times smaller than XML, and of being 20 to 100 times faster to parse than XML. Why? Because of the way the protocol buffer encodes bytes on the wire. So, what this means is that working with HBase can be very fast. Protobuf is used extensively within Google; they incorporate nearly 50,000 different message types into Protobuf across a wide variety of systems.
The database comes with a web console user interface to monitor and manage region servers and master servers.
Interestingly, Facebook recently declined to use Cassandra and has adopted an Hbase because this new Messaging model requires more flexible replication – as HBase does.
If you’re a sysadmin, we hope this information about Apache HBase comes in handy. And we hope you’ll take heed that – no matter how hardy your NoSQL database is – it needs monitoring! That’s why Monitis offers 24/7 independent remote monitoring of servers– all done via the cloud (so your firewalls don’t get in the way).
In fact, if you’re just dipping your big toe in the cloud, you can try our free server monitoring – the first such service in the industry!
Stay tuned for our next post on NoSQLs – this time a review of three other brands: