Introduction to HortonWorks Virtual Machine (VM)

Hortonworks is a company which provides a virtual environment pre-configured with Hadoop. The Hortonworks Sandbox includes the latest developments from HDP distribution. With this environment you can save a lot of time of installation and configuration. Hortonworks also provides tutorials with which you can use to start learning Hadoop. With the Sandbox it is very easy to learn Hadoop and no extra PC is required. It safe to perform any experimentation in the virtual environment so that your original system remains safe.

In this article I am going to teach you how to install the Hortonworks Sandbox and to get you started quickly and efficently.

Installation

Step 1.

Download the Hortonworks Sandbox from the site: http://hortonworks.com/products/hortonworks-sandbox/#install

 

Introduction to HortonWorks Virtual Machine (VM)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Step 2.  Double click on the sandbox and the installation will begin. Import the settings of sandbox.

 

Introduction to HortonWorks Virtual Machine (VM)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Introduction to HortonWorks Virtual Machine (VM)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Step 3. When the importing is finished, you should be able to see Hortonwork VM in your virtual box left panel.

 

Introduction to HortonWorks Virtual Machine (VM)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Step 4: Double click on the Hortonworks symbol to start the VM. Once the VM loads completely, it gives you information about the URL through which we access the Hortonwork’s Sandbox. With this VM you can easily run the MapReduce jobs developed in Hadoop.

 

Running an Example Program

Step 1. Create a jar file of the MapReduce program you have written. You can directly export a jar through Eclipse IDE.

Step 2. Open the Hortonwork’s Sandbox webpage by using the link provided. Generally, the link is http://127.0.0.1:8888. The following image gives an idea about how it looks.

Go to File Browser and upload the jar file as well as the input file. This completes the process of copying data into HDFS. Note the address of the jar file as well as the Input file. Here it is /user/hue.

 

Introduction to HortonWorks Virtual Machine (VM)

 

 

 

 

 

 

 

 

 

 

 

 

 

Step 3. Go the Job designer page and create a new Java job.

 

Introduction to HortonWorks Virtual Machine (VM)

 

 

 

 

 

 

 

Step 4. We need to fill in the form and save the configuration. As you can see, we have entered the name of the job and we provided the path of the jar file along with two arguments which the main class expects. The first argument is the path of the input file and the second argument is path of output.

 

Introduction to HortonWorks Virtual Machine (VM)

 

 

 

 

 

 

 

 

 

 

 

 

 

Step 5.  Then select the newly submitted job, in this case Rainfall and submit it.

 

Introduction to HortonWorks Virtual Machine (VM)

 

 

 

 

 

 

 

 

Step 6. The status of the job can be found in the Job Browser window as well. It shows you how much it has completed along with logs.

 

Introduction to HortonWorks Virtual Machine (VM)

 

 

 

 

 

 

 

 

 

Step 7. Once the job succeeds, you can go to the output folder provided and check the output file.

 

Introduction to HortonWorks Virtual Machine (VM)

 

 

 

 

 

 

 

 

 

 

 

 

This way Hortonworks makes it really easy to test the MapReduce jobs you have written. If you want to make any changes to the code you can simply recreate the jar and test it again. Hortonworks also provides similar web based UIs for other tools such as; Apache Hive, Apache Pig etc. The introduction of such tools has helped in the wide acceptance and use of Hadoop for performing heavy analysis.