Amazon’s Big Data Suite – Part 1

amazon big dataAmazon Inc. boasts a world class suite of products for handling, processing, storing and analyzing Big data. In this short two part series, we will look at some of the key products that Amazon offers. But before we dive into the intricacies of Amazon’s products, let’s talk a little bit about Big Data.

Big data is a very common buzzword these days and we keep hearing it. But what really is Big Data? Well, that is a relative question but a simple answer is that it is a type of data that has multiple sources without any clear structure. This is where Big Data analytics comes into play and it is here when Amazon can become your next best friend.


Big Data analytics is a process that helps you solve the mysteries that lie in that recent chunk of petabytes of data set that you’ve been holding onto for years. It is the science of getting deeper insights and understanding of the huge amount of data generated by different processes.  In today’s world, everything we do seems to generate massive data sets whether its customers generating data through social media networks or sensors generating data through long log files.


Understanding this data is very important since it allows you to change your strategy really fast to gain advantage over your competitors. The challenge with big data is all about the scale with which it is growing and the lack of structure. To leverage and use this data to full potential we needed new technologies and techniques. Coincidentally, when the data revolution was growing, geeks were able to come up with new frameworks such as Hadoop, which has made storing and processing of this data very easy.

Hadoop allows you to store the large data across different machines and also leverages power of different machines while performing some computations on this data. It has allowed us to process very large data sets which otherwise would have been very difficult. Along with advent of this, one more thing which has helped in an increase of big data analytics application is the rise of third party infrastructure for storing and processing this data. Amazon Web Services (AWS) provides such services which makes it easier and cheaper for people to play around with their data.


We will now look at some of the services provided by Amazon which are heavily used by the industry for building big data applications. Amazon provides numerous services such as Amazon EC2 cluster, Amazon Elastic Map Reduce, Amazon S3, Amazon DynamoDB, Amazon Kinesis, Amazon RDS and the list goes on. We will look at a few of them in more detail.


  1. 1.      Amazon EC2 Cluster


Amazon Elastic Compute Cloud (Amazon EC2) is a web service which provides computing capability to developers. Amazon provides web services which makes it very easier for the developer to launch a new instance of the cluster and configure it to his/her needs. It provides some pre-configured machines (AMI Amazon Machine Image). These machines are already with basic applications and libraries which makes them directly and immediately ready and useful.


These instances are available with multiple flavors and hence you can rent systems with the capacity enough to support your application. You don’t have to buy and keep processing power which you don’t want to use. It also provides flexibility of adding new machines to the cluster on a demand basis. EC2 comes with good security provisions as well as reliability. You can read more about EC2 here.



We will continue our study of Amazon’s Big Data suite in Part 2.

You might also like