Skip to content

Setup Hadoop 2.7.7 in multi node cluster

Amin Borjian edited this page Aug 8, 2019 · 12 revisions

Prerequisites

First go to /etc/hosts and define your nodes like this.

192.168.1.1 master
192.168.1.2 slave-1
192.168.1.3 slave-2

And make your ssh between servers password less like this documentation

Install jdk version 1.8 in all nodes and add JAVA_HOME to environment variables.

Download

Then download hadoop from this link.

Untar downloaded file and copy to /var/local/hadoop.

Environment Variables

Edit the ~/.bashrc file and add this lines.

export HADOOP_HOME=/var/local/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"

Resource it with command source ~/.bashrc

Configure

Go to etc/hadoop directory in HADOOP_HOME and edit following files and add this lines in <configuration>.

Edit core-site.xml and add this lines to configurations

<property>
   <name>fs.default.name</name>
   <value>hdfs://master:9000</value>
</property>

Edit hdfs-site.xml and add this lines

<property>
    <name>dfs.namenode.name.dir</name>
    <value>/var/local/hadoop/data/nameNode</value>
</property>

<property>
     <name>dfs.datanode.data.dir</name>
     <value>/var/local/hadoop/data/dataNode</value>
</property>

<property>
      <name>dfs.replication</name>
      <value>2</value> // besides original file, keep a copy
</property>

Rename mapred-site.xml.templete to mapred-site.xml and add this lines to it.

<property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
</property>

Edit yarn-site.xml and add this lines

<property>
    <name>yarn.acl.enable</name>
    <value>0</value>
</property>

<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>master</value>
</property>

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>

Edit slaves file as below:

slave-1
slave-2

Edit hadoop-env.sh and set Java path explicitly.

export JAVA_HOME=/var/local/jdk

If you want to change your default ssh port:

export HADOOP_SSH_OPTS="-p <your custom port>"

Propagate all configurations on all servers with scp command.

Start Hadoop

For the first time, format HDFS with below command:

hdfs namenode -format

After that, start hadoop in master node by this command start-all.sh.