Dated: 28 Apr 2020 Note about the setup: We are running the Ubuntu OS(s) on top of Windows via VirtualBox.1. Setting hostname in three Guest OS(s)
$ sudo gedit /etc/hostname The hostnames for three machines are master, slave1, and slave2.ON MASTER (Host OS IP: 192.168.1.12)
$ cat /etc/hosts 192.168.1.12 master 192.168.1.3 slave1 192.168.1.4 slave22. ON SLAVE2 (Host OS IP: 192.168.1.4)
$ cat /etc/hostname slave2 $ cat /etc/hosts 192.168.1.12 master 192.168.1.3 slave1 192.168.1.4 slave23. FOLLOW THE STEPS MENTIONED FOR SLAVE2 ALSO FOR SLAVE1 (Host OS IP: 192.168.1.3)
4. Configuring Key Based Login
Setup SSH in every node such that they can communicate with one another without any prompt for password. Check this link for: Steps of Doing SSH Setup5. Setting up ".bashrc" on each system (master, slave1, slave2)
$ sudo gedit ~/.bashrc Add the below lines at the end of the file. export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop export HADOOP_MAPRED_HOME=/usr/local/hadoop export HADOOP_COMMON_HOME=/usr/local/hadoop export HADOOP_HDFS_HOME=/usr/local/hadoop export YARN_HOME=/usr/local/hadoop6. Follow all the nine steps from the article below to setup Hadoop on "master" machine
Getting started with Hadoop on Ubuntu in VirtualBoxOn "master"
7. Set NameNode Location
Update your $HADOOP_HOME/etc/hadoop/core-site.xml file to set the NameNode location to master on port 9000: $HADOOP_HOME: /usr/local/hadoop Code: <configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> </configuration>8. Set path for HDFS
Edit $HADOOP_HOME/etc/hadoop/hdfs-site.xml file to resemble the following configuration. <configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/data/nameNode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/data/dataNode</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>9. Set YARN as Job Scheduler
Edit the mapred-site.xml file, setting YARN as the default framework for MapReduce operations $HADOOP_HOME/etc/hadoop/mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> </configuration>10. Configure YARN
Edit yarn-site.xml, which contains the configuration options for YARN. In the value field for the yarn.resourcemanager.hostname, replace 192.168.1.12 with the public IP address of "master": $HADOOP_HOME/etc/hadoop/yarn-site.xml <configuration> <property> <name>yarn.acl.enable</name> <value>0</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>192.168.1.12</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>11. Configure Workers
The file workers is used by startup scripts to start required daemons on all nodes. Edit this file: $HADOOP_HOME/etc/hadoop/workers to include both of the nodes: slave1 slave212. Configure Memory Allocation (Two steps)
A) Edit $HADOOP_HOME/etc/hadoop/yarn-site.xml and add the following lines: $ sudo gedit $HADOOP_HOME/etc/hadoop/yarn-site.xml <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>1536</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>1536</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>128</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> B) Edit $HADOOP_HOME/etc/hadoop/mapred-site.xml and add the following lines $ sudo gedit $HADOOP_HOME/etc/hadoop/mapred-site.xml <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>512</value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>256</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>256</value> </property>13. Duplicate Config Files on Each Node
Copy the Hadoop configuration files to the worker nodes: $ scp -r /usr/local/hadoop/etc/* ashish@slave1:/usr/local/hadoop/etc/ $ scp -r /usr/local/hadoop/etc/* ashish@slave2:/usr/local/hadoop/etc/ When you are copying contents of "/etc", the following file should be modified to contain the correct JAVA_HOME for each of the destination nodes. /usr/local/hadoop/etc/hadoop/hadoop-env.sh14. Format HDFS
HDFS needs to be formatted like any classical file system. On "master", run the following command: $ hdfs namenode -format Your Hadoop installation is now configured and ready to run.
15. ==> Start and Stop HDFS
Start the HDFS by running the following script from master:
/usr/local/hadoop/sbin/start-dfs.sh
This will start NameNode and SecondaryNameNode on master, and DataNode on slave1 and slave2, according to the configuration in the workers config file.
Check that every process is running with the jps command on each node. On master, you should see the following (the PID number will be different):
21922 Jps 21603 NameNode 21787 SecondaryNameNode
And on slave1 and slave2 you should see the following:
19728 DataNode 19819 Jps
To stop HDFS on master and worker nodes, run the following command from node-master:
stop-dfs.sh
16. ==> Monitor your HDFS Cluster
Point your browser to http://master:9870/dfshealth.html, where "master" IP is the IP address of your master, and you’ll get a user-friendly monitoring console.
No comments:
Post a Comment