Big Data Hadoop & Spark

Commissioning and Decommissioning of Datanode in Hadoop Cluster

What is Commissioning and Decommissioning?

Commissioning of nodes stand for adding new nodes in current cluster which operates your Hadoop framework. In contrast, Decommissioning of nodes stands for removing nodes from your cluster. This is very useful utility to handle node failure during the operation of Hadoop cluster without stopping entire Hadoop nodes in your cluster.

Why do we need decommissioning and commissioning?

You cannot directly remove any datanode in large cluster or a real-time cluster, as it will cause a lot of disturbance. And if you want to take a machine away for hardware up-gradation purpose, or if you want to bring down one or more than one node, decommissioning will required because you cannot suddenly shut down the datanode/slave-nodes. Similarly, if you want to scale your cluster or add new data nodes without shutting down the cluster, you need commissioning.

Factors affecting Commissioning and Decommissioning process :

The first step is to contact the YARN manager – why? This is because it contains the records of all the running process. So, the first step is tell your YARN that you are going to remove a datanode and then you need to tell your Namenode that you are going to remove a particular node. Next, let’s add the Decommissioning and Commissioning property into the core-site.xml file of the Master node (Namenode). 

There are some prerequisite like you should have a working Hadoop multinode cluster (obviously you required a cluster because you are going to remove one or more Datanode whether it is temporary or permanent).

We will start by adding Decommissioning property in Hadoop cluster. You need to add this for first time only and later on you need to update the exclude file alone. If the Decommissioning property is already added, then just update the exclude file for Decommissioning.

 Hadoop Cluster Configuration Note– In my case Resource Manager and Namenode on different machine so run all command accordingly.

Steps for Decommissioning:

1) Before add any property, stop your cluster. Otherwise, it will affect your cluster. You can do this using the command

Next, Go to your Resource Manager node to edit yarn-site.xml

2) You need to add this property in your yarn-site.xml


Note- In value section, mention your excludes file address.

Now, go to your master node (Namenode) and edit the hdfs-site.xml file-

3) Add this property to hdfs-site.xml


NoteIf the Resource Manager and the Namenode (Master Node) are on the same machine, then simply edit the yarn-site.xml  and hdfs-site.xml of Namenode (Master node)

4) Next, start your cluster using the following commands:                                   #(Run this command On Masternode/Namenode only)                                  #(Run this command On Resource Manager)

Note- If the Resource Manager (Nodemanager) and Namenode are running on the same machine, then run the above commands on Namenode (Master Node) only.

5) We need to update exclude file on both machine Resource manager and Namenode (Master Node), if it’s not there then we can create an exclude file on both the machines

vi excludes

Add the Datanode/Slave-node address, for decommissioning-


6) Run the following command in the Resource Manager:

yarn rmadmin -refreshNodes                     (on Resource Manager)

This command will basically check the yarn-site.xml and process that property.and decommission the mentioned node from yarn. It means now yarn manager will not give any job to this node.

yarn rmadmin -refreshNodes

7) Run the following command on the Namenode to check hdfs-site.xml and process the property and decommissioned the specified node/datanode.

hdfs dfsadmin -refreshNodes

hdfs dfsadmin -refreshNodes                 (on Namenode )

This command will basically check the yarn-site.xml and process that property, and Decommission the mentioned node from YARN. Meaning, the YARN Manager will not give any job to this node.

hadoop dfsadmin -refreshNodes

8) Run the command hadoop dfsadmin –report

 hadoop dfsadmin -report

hadoop dfsadmin -report

hadoop dfsadmin -report

Commissioning of Datanodes:

Commissioning process is just the opposite of decommissioning, but the configuration part is almost same for both.

Follow the steps for commissioning configuration –

Before starting commissioning steps, simply remove the exclude file on both machine or delete all the entries of exclude file ( make it blank)

Stop all daemons before adding any property into Hadoop cluster.

Open Resource manager machine to edit yarn-site.xml

1) Next, Go to yarn manager, and add this property into yarn-site.xml.

vi yarn-site.xml

Next, Go to your Namenode (Master Node).

2) Add this property to hdfs-site.xml:

vi hdfs-site.xml                           (on Namenode )

3) Now, start your cluster using the following commands:                            (Run this command On Namenode only)                             (Run this command On Resource Manager)

Note- If the Resource Manager (Nodemanager) and the Namenode are running on same machine, then run these commands on Namenode (Master Node) only.

4) We need to update the include file on both the Resource Manager and the Namenode (Master Node). If it’s not present, then create an include file on both the Nodes.

vi includes

Add the Datanode’s/Slave nodes IP address or hostname


Note- If you are going to add a new datanode or if you are scaling up your cluster by adding new node, you need to add the IP address and hostname to /etc/hosts file of all nodes ( Namenode, Datanode, Resource Manager).

Whenever you are going to do Commissioning, please mention all datanode address in the Include file.

5) Run the following command on the Resource Manager

yarn rmadmin -refreshNodes                 (on Resource Manager)

yarn rmadmin -refreshNodes

6) Next, go to the Master Node (Namenode) and run the following command to refresh all nodes:

Run this command to refresh all nodes-

hdfs dfsadmin -refreshNodes

hadoop dfsadmin -refreshNodes

7) Check Hadoop admin report using the command hadoop dfsadmin –report.

hadoop dfsadmin -report

hadoop dfsadmin -report

Here, you can see that ( datanode, which was on decommissioned state, is now on the Normal state (Commissioned).


  • The most important thing when you do commissioning is to make sure that the datanode which you are going to add has everything (Should be configured for Hadoop datanode).
  • And second thing which you need to keep in your mind is that, you should have to mention all necessary datanodes address in the include files.
  • Run cluster Balancer, as Balancer attempts to provide a balance to a certain threshold among data nodes by copying block data from older nodes to newly commissioned nodes.

How to run Hadoop Balancer?

hadoop balancer

Hadoop Balancer is a built in property which makes sure that no datanode will be over utilized. When you run the balancer utility, it checks whether some datanode are under-utilized or over-utilized and will balance the replication factor. But make sure the Balancer should run in only off peak hours in a real cluster, because if you run this during peak hours, it will cause a heavy load to networking, as it will transfer large amount of data.

So, this is how Commissioning is done!

Hope this post was helpful in understanding about the Commissioning and Decommissioning of the datanodes in Hadoop.

In case of any queries, feel free to write to us at [email protected] or comment below, and we will get back to you at the earliest. Keep visiting our website Acadgild for more updates on Big Data and other technologies. Click here to learn Big Data Hadoop Development.

Related Popular Courses:









  1. Good info. I have one quick question, Why you need to run ‘’ in step 1? Isn’t it will down hdfs , causing running jobs failure?

  2. there is a problem here if we stop all daemons before commissioning there will be a downtime for sure but hadoop is meant for high availability in that case u have to add the datanode without affecting the cluster’s performance in real time

  3. there is a problem here , commissioning/decommissioning is adding/removing datanodes to an active cluster i.e. all daemons should be running, while we perform commissioning and decomissioning. hadoop is known for its high availability(HA) and performance, thus if you are stopping all daemons the there will be definitely a downtime which will definitely affect the HA and performance,there is a method u can perform this without any downtime,as far as my opinion the above steps are completely rubbish and we cannot implement this in real time production

  4. Hi Zubair,
    Above steps are the standard steps. We need to stop the daemons only one time in the cluster. After that how many times you need to add/remove the nodes then there is no need to stop/start the cluster.
    I don’t think that any other one can explain better than this. It is the best explanation. If you have better explanation then you can put your points here rather than criticize to anyone.
    Thanks for the wonderful blog and sharing
    Atul Markan

  5. I am no longer certain where you are getting your information, however good topic.
    I needs to spend some time studying much more or understanding more.

    Thanks for great info I used to be in search of this information for my mission.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles