Amazon’s Big Data Suite – Part 2


In Part 1 we started our study of Amazon Services and looked at Amazon EC2. In this part, we will look at other Amazon services like EMR, DynamoDB and RDS.


  1. 1.      Amazon Elastic Map Reduce

Amazon EMR is a web service which makes cloud computing very easy. Amazon’s EMR cluster comes preconfigured with Hadoop, which as mentioned earlier is a data processing and storage framework. This preconfiguration makes it very easy to start analysing your data in no time. Amazon EMR has applications in machine learning, financial analysis, bioinformatics etc.


Just like EC2, you can launch any number of EMR instances as you need and you will only be charged for the computing power you have used. EMR is preconfigured with many data analysis tools like Pig and Hiv, which obviously serves as a bonus, saving you time and allowing you to get running quickly. Amazon also provides very easy CLIs (Command Line Interfaces) for launching and monitoring these clusters. If you wish,  you can read more about EMR here.


  1. 2.      Amazon DynamoDB

As mentioned in the article earlier, the nature of big data changes fast and there is no fixed structure to the data. Because of this lack of structure, traditional databases are not a valid choice for storing such data. This has led to the development of NoSQL databases which has no fix structure. DynamoDB is one such NoSQL databases provided by Amazon.


DynamoDB provides seamless and scalable performance which is achieved by distributing data across multiple locations. All the data in DynamoDB is stored in Solid State Drives which improves performance even further. To launch DynamoDB is a walk in the park as an operations-console gets you online in no time. You can read more about DynamoDB here.


  1. 3.      Amazon Kinesis

Many analytics applications requires real time processing of streaming data as decisions have to be made quickly for the highest ROI. Amazon Kinesis is newly launched service for collecting and processing such large real time data. This large data can come from operational logs of systems or clickstream data from e-commerce websites etc. With Kinesis, you can process this real-time data in a jiffy and immediately make recommendations. Kinesis is very easy to use and can be integrated efficently with existing Amazon services like Amazon S3, Amazon DynamoDB etc. More information about Amazon Kinesis can be found here.


  1. 4.      Amazon Relational Database Service (RDS)

Amazon RDS is simple relational database service which is very easy to setup and operate. It provides all the functionalities available with traditional RDBMS systems like Oracle, MySQL etc. RDS is similar to other Amazon services as it is  resizable and very cost effective. It is equipped with a console through which it can be administered. More information about RDS can be found here.



Amazon has provided an elaborate documentation of all their services which makes them even more easy to use. They provide a complete guideline of how to launch an EC2 cluster or an EMR cluster. There is also a provision of Free Tier services. You can access limited resources of Amazon with no cost so that you can try it out and then decide if you want to use it for your application.

This link  provides a very comprehensive guide to start using Amazon Web Services for your application. The services which I have listed here are a few of what Amazon offers. Apart from this, Amazon provides many application services such as Amazon SNS (Simple Notification Service), Amazon SQS (Simple Queue Service) using which we can develop a complete application.