The Big Data explosion in recent years has created a number of new data storage and processing technologies. Platforms like NoSQL, Yarn, and Hadoop are now familiar terms within this growing ecosystem. In addition, we’re hearing a lot more alternative new ways of managing data through about graph databases and also triplestore databases, which are especially relevant for querying the semantic web. The enormous onset of information being generated through Internet of Things not to mention M2M (machine to machine) communications and the emerging wearables market . . . all will require organizations to leverage the latest new technologies to collect, prepare, analyze, and visualize various types of complex data.
So the question becomes this. Do relational databases (RDMS), which are a throwback to the 1970s, have much of a future? Can they still handle the epic data and information load requirements in today’s fast-paced digital landscape? Well, as is often the case, the answer is not a simple “yes” or “no”. There are good reasons why the value prop of relational DB is tenuous at best in the Big Data era. But, at the same time, you don’t want to “throw the baby out with the bath-water” just yet.
Read on for an overview of how to think about relational DBs in relation to your Big Data strategy.
Relational just isn’t cut out for Big Data
Granted many business applications today rely on relational DBs and these systems are great at processing structured data, the reality is they are ill-equipped to deal with multivariate and unstructured types, which forms the crux of much of today’s Big Data. Running data in tables with key-value joins and querying in SQL isn’t going to meet your needs for agile, quick, and fast data processing needed by most competitive organizations today.
Relational databases translate to high overhead
Maintaining a relational DB can require a fair amount of cost for an organization. As the complexity of the DB increases, so does the requirements for RDMS administrators, developers, and other personnel needed to maintain the efficiency of the system.
Relational offers slow extraction of meaning from data
The age of semantic web technologies means that customers and stakeholders are expecting faster and “smarter” insights from data, the insights elicited through sophisticated machine learning algorithms and new data modeling techniques. Imagine trying to generate a query similar to this example: “find all meetings that happened in November 2010 within 5 miles of Berkeley that were attended by the three most influential people among Joe’s friends and friends-of-friends.” You see the point, relational DBs are not going to compete very well in a market that requires new and exciting extractions of meaning from data.
Relational databases performance is hard to predict
Relational DBs don’t scale up well to very large data sizes or to data in shared environments. For example, a legacy application using a relational database may require sporadic updates by a human operator throughout the month. But what happens if your organization wants to juxtapose that data with batch updates from other companies that you’ve partnered with and the batch loads take several minute. These processes can put strain on your online application and lead to interruptions in service.
But relational databases do still have advantages
There are still notable advantages of the relational model that shouldn’t be completely overlooked. For example, the relative ease of use of accessing and managing data in rows and columns is a plus. Data security is also more easily maintained in a relational environment where certain sensitive tables can be “hidden” with their own authorization controls. Relational DBs are ubiquitous in organizations today and the skill sets are specialized but still readily available. Many folks in business operations have probably done some work in the table environment of a relational DB. And more and more alternative DBs are finding value in providing SQL interfaces. Hadoop is a case in point. Though it represents a new paradigm in data management, more and more tools have emerged to make Hadoop data accessible through the ever familiar SQL language. In fact, SQL-on-Hadoop is now becoming a standard protocol for many platforms and is expected to continue experiencing strong growth in the Big Data market.