Big Data Hadoop & Spark

Hadoop Multinode Cluster Configuration

In this blog we will describe the steps and required configurations for setting up a distributed multi-node Apache Hadoop cluster.

Prerequisites

1. Single node hadoop cluster

{If you have not configured single node hadoop cluster yet, then click below to configure single node Hadoop cluster first.}

How to install single node hadoop cluster

After configuring single node Hadoop cluster, make clone of your single node cluster to set-up multi-node Hadoop cluster.

Cloning steps-

a> Right click on your Masternode (single node cluster), you will get a screen like below-

b> Select clone option

c > give a new name to clone machine-

make sure you have clicked on Reintialize the MAC address of all network cards –

Note- [ Reinitialize the mac address while cloning. ]

d > select Full clone

Now click on clone option it will take some time to make a new virtual machine ( Datanode).

Repeat the same process to make second Datanode.

Note- [ Reinitialize the mac address while cloning. ]

2.Networking

Networking plays an important role here, before merging single node cluster into a multi node cluster we need to make sure that all the node pings each other( they need to be connected on the same network / hub or both the machines can speak to each other).

In this, Network configuration for Hadoop Clusters are following-

IP Address for Masternode (Namenode) is – 192.168.10.100

IP Address of Datanode 1 (slave node) – 192.168.10.101

IP Address of Datanode 2 (slave node) – 192.168.10.102

Check the communication between master and slaves-

Ping through IP address-

ping  192.168.10.101


 

ping  192.168.10.102

If they are connecting then ping through there hostname-

ping  dn1.mycluster.com


 

ping  dn2.mycluster.com

Note- Verify pinging from slave nodes also, to check whether they are able to communicate with Master node or not. If you are getting acknowledgement, then you are able to communicate.   

c) Verify password less ssh login –

ssh  dn1.mycluster.com


 

ssh  dn2.mycluster.com

d) Stop iptables of each Node( Namenode, Datanode1, Datanode2)-

sudo service iptables stop

or

service iptables stop

Come to your Master node (Namenode)-

Namenode Configuration –

Before configuring Master node (Namenode), make sure you have configured /etc/hosts file.

To configure /etc/hosts file-

sudo vi /etc/hosts
192.168.10.100 namenode.mycluster.com
192.168.10.101 dn1.mycluster.com
192.168.10.102 dn2.mycluster.com

Now follow the steps to make changes on each machine (Nodes) –

These are the changes have to be made on Master node (Namenode)

1) login your Master node (Namenode) and move on hadoop directory to make changes-

cd hadoop-2.6.0/etc/hadoop/

2) open core-site.xml and modify copy the following –

vi core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://namenode.mycluster.com:9000</value>
</property>
</configuration>

3) open hdfs-site.xml

vi hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/hadoop/namenode</value>
</property>
<property>
<name>dfs.block.size</name>
<value>67108864</value>
</property>
</configuration>

Note:–  Here <value>/home/hadoop/hadoop/namenode</value>  ,

/home/hadoop is the home directory of hadoop user. you need to give your user directory name.

and rest part is directory name which we have created .

Hadoop

4) open mapred-site.xml

vi  mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

5) open yarn-site.xml and add these entries-

 vi yarn-s ite.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>namenode.mycluster.com:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>namenode.mycluster.com:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>namenode.mycluster.com:8050</value>
</property>
</configuration>

see the screen-shot below-

6) Restart the ssh service by typing the below command. 

 sudo service sshd start

DataNode Configuration-

Before configuring Datanode make sure have configured /etc/hosts file.

To configure /etc/hosts file-

sudo vi /etc/hosts
192.168.10.100 namenode.mycluster.com
192.168.10.101 dn1.mycluster.com
192.168.10.102 dn2.mycluster.com

Hadoop

Follow the steps to update Datanode

1) Login to your Datanode and move on hadoop directory to make changes-

cd hadoop-2.6.0/etc/hadoop/

2) open core-site.xml and modify copy the following –

 vi core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://namenode.mycluster.com:9000</value>
</property>
</configuration>

3) open hdfs-site.xml

sudo vi hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/hadoop/datanode</value>
</property>
<property>
<name>dfs.block.size</name>
<value>67108864</value>
</property>
</configuration>

Note:–  Here <value>/home/hadoop/hadoop/namenode</value>  ,

/home/hadoop is home directory of hadoop user. you need to give your user directory name.

and rest part is directory name which we have created .

4) open yarn-site.xml

vi  yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>namenode.mycluster.com:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>namenode.mycluster.com:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>namenode.mycluster.com:8050</value>
</property>
</configuration>

5) open mapred-site.xml

vi  mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

6) Restart the ssh service by typing the below command.

 sudo service sshd start

Note- Repeat the same steps for all DataNode Configuration.

Create /home/hadoop/hadoop/namenode directory to Master node (Namenode) and /home/hadoop/hadoop/datanode directory to both Datanodes(Slave Nodes)-

mkdir -p /home/hadoop/hadoop/namenode                        (on Master node only)
mkdir -p /home/hadoop/hadoop/datanode                        (on Salves Node only)

Note- If they are already exist then remove it and create new directories by above commands.

Login your Masternode (Namenode) and follow these steps to start your hadoop cluster-

To start all the daemons follow the below steps:

1) Format the NameNode first:

 hadoop namenode -format

2) Starting dfs daemons in Namenode-

Starting NameNode:

Type the below command to start dfs daemons:-

./start-dfs.sh

3) type jps to see running daemons-

jps

4) start yarn and historyserver daemons-

start-yarn.sh
 mr-jobhistory-daemon.sh start historyserver
jps

You can also use start-all.sh to start all daemons-

start-all.sh

 Login to your data node and verify the running daemons-

jps

you can also check on another datanode-

here the screen shot, where you can see the running daemons on each Nodes-

6) Verify live slave nodes by hadoop dfsadmin report :-

hadoop dfsadmin -report

Now open your browser and copy below addresses into url bar-

192.168.10.100:50070

you  can see a screen like that –

This is your GUI ( a webserver of hadoop) for hadoop cluster.

Through GUI you can easily manage your cluster. Keep visiting our website Acadgild for more updates on Big Data and other technologies. Click here to learn Big Data Hadoop Development.

Hadoop

Tags

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close