Big Data Hadoop & Spark

Commissioning and Decommissioning of Datanode in Hadoop

Commissioning of nodes stand for adding new nodes in current cluster which operates your Hadoop framework. In contrast, Decommissioning of nodes stands for removing nodes from your cluster. This is very useful utility to handle node failure during the operation of Hadoop cluster without stopping entire Hadoop nodes in your cluster.

Why do we need decommissioning and commissioning?

100% Free Course On Big Data Essentials

Subscribe to our blog and get access to this course ABSOLUTELY FREE.

You cannot directly remove any datanode in large cluster or a real-time cluster, as it will cause a lot of disturbance. And if you want to take a machine away for hardware up-gradation purpose, or if you want to bring down one or more than one node, decommissioning will required because you cannot suddenly shut down the datanode/slave-nodes. Similarly, if you want to scale your cluster or add new data nodes without shutting down the cluster, you need commissioning.

Factors affecting Commissioning and Decommissioning process :

The first step is to contact the YARN manager – why? This is because it contains the records of all the running process. So, the first step is tell your YARN that you are going to remove a datanode and then you need to tell your Namenode that you are going to remove a particular node. Next, let’s add the Decommissioning and Commissioning property into the core-site.xml file of the Master node (Namenode). 

There are some prerequisite like you should have a working Hadoop multinode cluster (obviously you required a cluster because you are going to remove one or more Datanode whether it is temporary or permanent).

We will start by adding Decommissioning property in Hadoop cluster. You need to add this for first time only and later on you need to update the exclude file alone. If the Decommissioning property is already added, then just update the exclude file for Decommissioning.

 Hadoop Cluster Configuration Note– In my case Resource Manager and Namenode on different machine so run all command accordingly.

Steps for Decommissioning:

1) Before add any property, stop your cluster. Otherwise, it will affect your cluster. You can do this using the command stop-dfs.sh

stop-dfs.sh

Next, Go to your Resource Manager node to edit yarn-site.xml

2) You need to add this property in your yarn-site.xml

<property>
   <name>yarn.resourcemanager.node.exclude-path</name>
   <value>/home/hadoop/excludes</value>
</property>