We are going to create a single node cluster of Hadoop in this post.
Step 1: Install VirtualBox 6.0 or higher (by launching the .EXE file as administrator)
Step 2: Download the .ISO file for the latest "Ubuntu Desktop" (version used for this post: Ubuntu 18.04.2 LTS) from here "https://ubuntu.com/download/desktop"
Step 3: Install Ubuntu as shown in this post "https://survival8.blogspot.com/p/demonstrating-shared-folder-feature-for.html"
Step 4. Installing Java
To get started, we'll update our package list:
sudo apt-get update
Next, we'll install OpenJDK, the default Java Development Kit on Ubuntu 16.04.
sudo apt-get install default-jdk
Once the installation is complete, let's check the version.
java -version
Output
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
This output verifies that OpenJDK has been successfully installed.
=============================================
Step 5. Retrieve the Hadoop archive from here "http://hadoop.apache.org/releases.html"
ashish:~/Desktop$ wget http://mirrors.estointernet.in/apache/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz
--2019-06-27 14:55:52-- http://mirrors.estointernet.in/apache/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz
Resolving mirrors.estointernet.in (mirrors.estointernet.in)... 103.123.234.254, 2403:8940:2::f
Connecting to mirrors.estointernet.in (mirrors.estointernet.in)|103.123.234.254|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 332433589 (317M) [application/octet-stream]
Saving to: ‘hadoop-3.1.2.tar.gz’
2019-06-27 15:11:32 (345 KB/s) - ‘hadoop-3.1.2.tar.gz’ saved [332433589/332433589]
=============================================
Step 6. Extract the archive.
ashish:~/Desktop$ tar -xzf hadoop-3.1.2.tar.gz
ashish:~/Desktop$ ls
Anaconda3-2019.03-Linux-x86_64.sh hadoop-3.1.2 hadoop-3.1.2.tar.gz
=============================================
Step 7. Move the extracted files into /usr/local, the appropriate place for locally installed software.
ashish:~/Desktop$ sudo mv hadoop-3.1.2 /usr/local/hadoop
=============================================
Step 8. Next, we have to update the "JAVA_HOME" path in the "hadoop.env.sh" file.
To find the default Java path, fire this command:
readlink -f /usr/bin/java | sed "s:bin/java::"
Output
/usr/lib/jvm/java-8-openjdk-amd64/jre/
ashish:/usr/local/hadoop$ sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
We appended this line "export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/" at the end of the file (or replace it if it is already present in the file).
ashish:/usr/local/hadoop$ cat /usr/local/hadoop/etc/hadoop/hadoop-env.sh
=============================================
Step 9. Running the Hadoop.
Step 9.1. Prepare the input files and folders.
ashish:~$ cd Desktop/
ashish:~/Desktop$ mkdir input
ashish:~/Desktop$ cp /usr/local/hadoop/etc/hadoop/*.xml input
Step 9.2. Run Hadoop.
ashish:~/Desktop$ ls /usr/local/hadoop/share/hadoop/mapreduce
hadoop-mapreduce-client-app-3.1.2.jar
...
hadoop-mapreduce-examples-3.1.2.jar
...
Here, we are shooting up "grep" program to find the words "xml" or "configuration" in the input files.
ashish:~/Desktop$ /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar grep input grep_example 'xml|configuration'
Step 9.3. Viewing the output.
ashish:~/Desktop$ cat grep_example/*
25 configuration
12 xml
No comments:
Post a Comment