Getting started with Hadoop on Ubuntu in VirtualBox


We are going to create a single node cluster of Hadoop in this post.

Step 1: Install VirtualBox 6.0 or higher (by launching the .EXE file as administrator)

Step 2: Download the .ISO file for the latest "Ubuntu Desktop" (version used for this post: Ubuntu 18.04.2 LTS) from here "https://ubuntu.com/download/desktop"

Step 3: Install Ubuntu as shown in this post "https://survival8.blogspot.com/p/demonstrating-shared-folder-feature-for.html"

Step 4. Installing Java

To get started, we'll update our package list:

sudo apt-get update

Next, we'll install OpenJDK, the default Java Development Kit on Ubuntu 16.04.

sudo apt-get install default-jdk

Once the installation is complete, let's check the version.

java -version

Output

openjdk version "1.8.0_91"

OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)

OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

This output verifies that OpenJDK has been successfully installed.

=============================================

Step 5. Retrieve the Hadoop archive from here "http://hadoop.apache.org/releases.html"

ashish:~/Desktop$ wget http://mirrors.estointernet.in/apache/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz

--2019-06-27 14:55:52-- http://mirrors.estointernet.in/apache/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz

Resolving mirrors.estointernet.in (mirrors.estointernet.in)... 103.123.234.254, 2403:8940:2::f

Connecting to mirrors.estointernet.in (mirrors.estointernet.in)|103.123.234.254|:80... connected.

HTTP request sent, awaiting response... 200 OK

Length: 332433589 (317M) [application/octet-stream]

Saving to: ‘hadoop-3.1.2.tar.gz’

2019-06-27 15:11:32 (345 KB/s) - ‘hadoop-3.1.2.tar.gz’ saved [332433589/332433589]

=============================================

Step 6. Extract the archive.

ashish:~/Desktop$ tar -xzf hadoop-3.1.2.tar.gz

ashish:~/Desktop$ ls

Anaconda3-2019.03-Linux-x86_64.sh hadoop-3.1.2 hadoop-3.1.2.tar.gz

=============================================

Step 7. Move the extracted files into /usr/local, the appropriate place for locally installed software.

ashish:~/Desktop$ sudo mv hadoop-3.1.2 /usr/local/hadoop

=============================================

Step 8. Next, we have to update the "JAVA_HOME" path in the "hadoop.env.sh" file.

To find the default Java path, fire this command:

readlink -f /usr/bin/java | sed "s:bin/java::"

Output

/usr/lib/jvm/java-8-openjdk-amd64/jre/

ashish:/usr/local/hadoop$ sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

We appended this line "export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/" at the end of the file (or replace it if it is already present in the file).

ashish:/usr/local/hadoop$ cat /usr/local/hadoop/etc/hadoop/hadoop-env.sh

=============================================

Step 9. Running the Hadoop.

Step 9.1. Prepare the input files and folders.

ashish:~$ cd Desktop/

ashish:~/Desktop$ mkdir input

ashish:~/Desktop$ cp /usr/local/hadoop/etc/hadoop/*.xml input

Step 9.2. Run Hadoop.

ashish:~/Desktop$ ls /usr/local/hadoop/share/hadoop/mapreduce

hadoop-mapreduce-client-app-3.1.2.jar

...

hadoop-mapreduce-examples-3.1.2.jar

...

Here, we are shooting up "grep" program to find the words "xml" or "configuration" in the input files.

ashish:~/Desktop$ /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar grep input grep_example 'xml|configuration'

Step 9.3. Viewing the output.

ashish:~/Desktop$ cat grep_example/*

25 configuration

12 xml

No comments:

Post a Comment