In this day of rapid scale growth in Big Data, predictive analytics, and real time processing platforms like Hadoop, a fair question may arise . . . what value is the traditional data warehouse? It’s a fair question because before the iPhone, Facebook, Twitter, and Xbox, there was well . . . the data warehouse. For the last 30 odd years the data warehouse has been, what one articles describes, as “the business-insights workhorse of enterprise computing.” And despite many transformations over the past 5 years in the area of cloud, mobile, and information technologies, data warehousing has stayed relevant. Yes, there are more options on the table today for data storage, analysis, and indexing, but data warehouses have remained as timely as ever.
To be sure we’re clear on definitions, a data warehouse is “a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.”
Oracle, a well-known player in the market, last year identified the top 10 trends in data warehousing, including such things as real-time analytics, better customer experience capabilities, in-memory technologies, and more. In the words of one analysis, the data warehousing landscape is comprised of “a new generation of data warehouses that are bigger, better, and faster than ever before, transforming data into information and information into actionable insights, enabling businesses to forge ahead with unprecedented speed and agility.”
So with these points in mind let’s review in more detail the state of the date warehouse market by surveying the top 5 vendors. Here’s a review of the major players you’ll want to pay attention to if you’re looking to get started in or upgrade to a data warehouse in 2015.
Teradata is a market leader in the data warehousing space that brings more than 30 years of history to the table. It appears as the leader in Gartner’s 2014 Magic Quadrant for Data Warehouse Database Management Systems and has been so consistently for the past 15 years. The company is leading the charge with new tools, innovations, and capabilities, including all the latest in Hadoop-based technologies. Teradata’s EDW (enterprise data warehouse) platform provides businesses with robust, scalable hybrid-storage capabilities and analytics from mounds of unstructured and structured data leading to real-time business intelligence insights, trends, and opportunities. Teradata also offers a cloud-based DBMS solution via its Aster Database platform. Gartner reports that Teradata counts more than 1200 customers.
Oracle is basically the household name in relational databases and data warehousing and has been so for decades. Oracle 12c Database is the industry standard for high performance scalable, optimized data warehousing. The company’s specialized platform for the data warehousing side is the Oracle Exadata Machine. There are an estimated 390,000 Oracle DBMS customers worldwide, and Gartner estimates about 4,000 Exadata data warehousing appliances have been sold. This state of the art platform provides such advanced features as Flash Storage for low I/O overhead and Hybrid Columnar Compression (HCC), which enables high level compression of data for reduced I/O especially for analytics.
3. Amazon Web Services (AWS)
The whole shift in data storage and warehousing to the cloud over the last several years has been momentous and Amazon has been a market leader in that whole paradigm. Amazon offers a whole ecosystem of data storage tools and resources that complement its cloud services platform. For example, there is Amazon Redshift, a fast, fully managed, petabyte-scale data warehouse cloud solution; AWS Data Pipeline, a web service designed for transporting data between existing AWS data services; and Elastic MapReduce, which provides an easily managed Hadoop solution on top of the AWS services platform. According to Gartner, Amazon was the overall leader in data warehousing customer satisfaction and experience in last year’s survey.
Cloudera has emerged in recent years as a major enterprise provider of Hadoop-based data storage and processing solutions. Cloudera offers an Enterprise Data Hub (EDH) for its variety of operational data store, or data warehouse. The EDH is Cloudera’s proprietary framework for the “information-driven enterprise” and focuses on “batch processing, interactive SQL, enterprise search, and advanced analytics—together with the robust security, governance, data protection, and management that enterprises require.” Cloudera’s data warehouse is based on CDH, which is Cloudera’s version of Apache Hadoop and the world’s largest distribution at that. The organization offers a number of different bundles of its Hadoop-based services, including Cloudera Express and Cloudera Enterprise. Gartner reports high customer satisfaction and confidence in Cloudera’s personnel and their skills in deploying Hadoop as a data processing and management system.
MarkLogic is a Silicon Valley-based private software firm founded in 2001 that offers an enterprise NoSQL database platform. MarkLogic was included in Gartner’s Magic Quadrant on Data Warehouse Database Management Systems for the first time in 2014. This inclusion also reflects a broader shift in the data warehousing market as organizations are seeing NoSQL and other alternative forms of storage and processing as the new reality for architecting their datacenter infrastructures and minimizing data complexity. In 2013 MarkLogic released a new semantics platform which provides the capability of storing billions of RDF triples that can queried with SPARQL (a semantic query language for the RDF platform) to provide richer, deeper insights to data in ways not possible within relational models. The inclusion of semantics-based technologies, along with what we’ve seen already of cloud and Hadoop, represents yet another level of innovation that will continue to keep data warehouses scalable and adaptable to the fast-paced needs of the digital era.