Data Analytics in the Cloud: Two Cool NoSQL ‘Big Data’ Options for the SMB
Some estimates suggest that by 2015 the digital universe will grow to 8 zettabytes of data (1 Zettabyte = 1,000,000,000,000,000,000,000 bytes). Much has been written in recent years about “Big Data” and the implications for Information management and data analytics. Simply put, Big data is data that is too large to process using traditional methods.
By ‘traditional methods’ we refer to the relational database environments (RDBMS) where data is organized into a set of formally described tables and often accessed using the structured query language (SQL). These systems were designed decades ago when data was much more structured and less accessible. With the development of web technologies and open source architectures, database management systems have also evolved. The most notable expression of this is MySQL, which is open-source and easily accessible to the beginner, and often bundled into software packages in some variation of the LAMP environment. By contrast, more than half of the digital data today is the unstructured data from social networks, mobile devices, web applications and other similar sources.
While Big Data has become a “big” buzzword in the IT industry today – similar to and, in many ways, a consequence of the Cloud computing phenomenon – and has spun off many kinds of definitions, the essence of the phenomenon can be summed up in the following O’Reilly definition: “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”
The need to understand and manage Big Data has become the bread and butter of IT and engineering teams at major tech companies like Google, Amazon, Facebook, Twitter, as well as other entities that traffic millions of users. But what solutions are available to the SMB, to the average sized business? According to a study released in April 2012 by Techaisle, a survey of over 800 SMBS revealed that 34 percent of US mid-market businesses that are currently using business intelligence are also interested in big data analytics.
In its recent “Hype Cycle for Big Data 2012” Emerging technologies report, the major research firm Gartner states that Column-Store DBMS, Cloud Computing, In-Memory Database Management Systems will be the three most transformational technologies in the next five years. This same report predicts that Complex Event Processing, Content Analytics, Context-Enriched Services, Hybrid Cloud Computing, Information Capabilities Framework and Telematics are part of the emerging technologies that Gartner also considers to be transformational. The Hype Cycle for Big Data is shown below:
The time has arrived for SMBs to seriously start thinking about Big Data solutions. As one source has well stated, “It may take a while but eventually any good technology embraced by large enterprises trickles its way down to small and mid-sized businesses in some appropriately modified and re-priced form. It will be no different for modern business analytics tools. The time could be ripe for mid-range customers to start thinking about either modernising their data warehouses or data marts if they are lucky enough to have any, or come up with a plan to install a business analytics platforms if they don’t.”
With this in mind, here are two Important “Big Data” Solutions for the SMB to Keep an Eye on . . .
Google Big Query
Big Query was introduced in limited preview in November 2011 and made publicly available May 1, 2012, fulfilling Google’s desire to “bring Big Data analytics to all businesses via the cloud.” With Big Query, Google has developed a data analytics solution that offers an easy to use and quickly scalable framework for looking at massive amounts of data in the cloud within a traditional SQL framework. As its tagline suggests, Big Query allows one to “analyze terabytes of data with just a click of a button.”
The setup process for Big Query takes less than 5 minutes. Simply Log in to the Google APIs Console and then create a new Google APIs Console project or use an existing project. Navigate to the API Services table and Click on Services on the left-hand sidebar and then Enable BigQuery.
Once Big Query is enabled, click on the “Big Query” link choose to manage data through the “web interface” tool.
You’ll then be presented with a screen that resembles the basic contours of a traditional MySQL environment, but which is much more simplified. Google has provided a set of public data:samples. Click the drop-down and you’ll be presented with a list of these samples. Click on “natality” and then “details”. This brings up the Center for Disease Control (CDC) Birth Vital Statistics for all birth data available in the United States from the 50 States, the District of Columbia, and New York City from 1969 to 2008. In the data set below there are over 137M rows of data!
In order to run a sample query, go back to the homepage for the “Big Query Browser Tool Tutorial” and select “Run a Query”. You’ll now be presented with a series of sample SQL queries. Choose the one that will select the 10 heaviest children by birth weight that were born in the United States between 1969 and 2008:
SELECT weight_pounds, state, year, gestation_weeks FROM publicdata:samples.natality
ORDER BY weight_pounds DESC LIMIT 10;
Copy and paste the query back into your Compose Query textbox and select “Run Query”. Within seconds, the query extracts the 10 largest birth weights from 137M records from 30 years of data!
What is amazing about the Big Query interface is the scale of data that is easily presentable to the user in no time. Users can of course create their own tables by importing data from one’s local environment or from Google Cloud Storage. The opportunities for slicing and dicing large data sets are now almost limitless with Google’s Big Query solution to data analytics.
BIME (pronounced “beam”) is a French startup that has partnered with Google to create a front-end application for Big Query that can be used as a business analytics tool. The application runs on Amazon’s Web Services compute cloud and can import data from Big Query or any variety of cloud and non-cloud sources. With the clever tagline of “Mine Your Own Business.” BIME in its own words “is a revolutionary approach to data analysis and dashboarding. It allows you to analyze your data through interactive data visualizations and create stunning dashboards from the Web.”
BIME can be used to import and slice and dice the CDC Birth statistics discussed above.
BIME offers a very easy to sign up free 10 day trial with no obligation. Once you sign up for a free account, go to “Create a Connection”
You’ll then need to define a data source from where you wish to import your data set. For very large data sets, you will need to select BimeDB.
For more conventional data sets, you can import your data sets directly from the desktop. BIME offers an Excel-like environment in which data sets of any size can be sliced and diced and pivoted to derive the desired analytics.
We ran a sample Google’s BigQuery CDC Birth statistics table in order to extract the top 500 birth weights from 1969-2008, and then in turn derive the average birth weight for a sampling of five states: Alabama, North Dakota, South Carolina, Texas, and Washington.
Following the 10 day free trial period, BIME users can upgrade to a scaled price plan depending on the data analysis needs of their business.
In conclusion, it bears important mentioning that “Big Data” is Big Business not only for large corporations but for SMBs as well. The discussion above has outlined two major data analytics solutions that are easily accessible and scalable for the everyday small-medium business. Within the emerging technology spectrum, Big Data is critically important and those companies able to easily and efficiently slice and dice this data to provide accurate consumer trends, market forecasts, and offer stakeholders the most up-to-date analysis and metrics, immediately will set themselves apart from other players in the industry. Consider BigQuery and BIME today for your SMB data analytics solutions!