Introduction to HortonWorks Virtual Machine (VM)

Hortonworks is a company which provides a virtual environment pre-configured with Hadoop. The Hortonworks Sandbox includes the latest developments from HDP distribution. With this environment you can save a lot of time of installation and configuration. Hortonworks also provides tutorials with which you can use to start learning Hadoop. With the Sandbox it is very easy to learn Hadoop and no extra PC is required. It safe to perform any experimentation in the virtual environment so that your original system remains safe.

In this article I am going to teach you how to install the Hortonworks Sandbox and to get you started quickly and efficently.


Step 1.

Download the Hortonworks Sandbox from the site:


















Step 2.  Double click on the sandbox and the installation will begin. Import the settings of sandbox.




































Step 3. When the importing is finished, you should be able to see Hortonwork VM in your virtual box left panel.



















Step 4: Double click on the Hortonworks symbol to start the VM. Once the VM loads completely, it gives you information about the URL through which we access the Hortonwork’s Sandbox. With this VM you can easily run the MapReduce jobs developed in Hadoop.


Running an Example Program

Step 1. Create a jar file of the MapReduce program you have written. You can directly export a jar through Eclipse IDE.

Step 2. Open the Hortonwork’s Sandbox webpage by using the link provided. Generally, the link is The following image gives an idea about how it looks.

Go to File Browser and upload the jar file as well as the input file. This completes the process of copying data into HDFS. Note the address of the jar file as well as the Input file. Here it is /user/hue.
















Step 3. Go the Job designer page and create a new Java job.


java job








Step 4. We need to fill in the form and save the configuration. As you can see, we have entered the name of the job and we provided the path of the jar file along with two arguments which the main class expects. The first argument is the path of the input file and the second argument is path of output.


job design














Step 5.  Then select the newly submitted job, in this case Rainfall and submit it.











Step 6. The status of the job can be found in the Job Browser window as well. It shows you how much it has completed along with logs.


job browser










Step 7. Once the job succeeds, you can go to the output folder provided and check the output file.


output file













This way Hortonworks makes it really easy to test the MapReduce jobs you have written. If you want to make any changes to the code you can simply recreate the jar and test it again. Hortonworks also provides similar web based UIs for other tools such as; Apache Hive, Apache Pig etc. The introduction of such tools has helped in the wide acceptance and use of Hadoop for performing heavy analysis.