All CategoriesBig Data Hadoop & Spark

Hadoop 3.X Installation Guide

Hadoop 3.x is the latest release of Hadoop which is still in alpha phase. Developers who are interested in Hadoop can install the product and report to Apache if they found any issues or bugs. There are many new features that are introduced in Hadoop 3.x.

In this blog, we will be discussing about how to install Hadoop 3.x in a pseudo distributed mode and exploring HDFS new features.

100% Free Course On Big Data Essentials

Subscribe to our blog and get access to this course ABSOLUTELY FREE.

Here are the list of changes and features that are introduced in Hadoop 3.x

  • Minimum required Java version increased from Java 7 to Java 8
  • Support for erasure encoding in HDFS
  • YARN Timeline Service v.2
  • Shell script rewrite
  • MapReduce task-level native optimization
  • Support for more than 2 NameNodes
  • Default ports of multiple services have been changed
  • Support for Microsoft Azure Data Lake filesystem connector
  • Intra-datanode balancer
  • Reworked daemon and task heap management

We also recommend our users to go for our blog 10 differences between Hadoop2.x and Hadoop3.x.

Hadoop 3.x Installation Procedure

Let’s get started with Hadoop 3.x installation

Download latest version of Hadoop release from here

We have downloaded Hadoop3.0.0.alpha2.tar.gz

After downloading, move into the downloaded folder and extract it using the command

tar -xzf hadoop3.0.0.alpha2.tar.gz

Note: We believe that Java is already installed in your system. The minimum JDK required for Hadoop3.x is jdk8.

Setting JAVA_HOME path

Now move into the etc/hadoop/ directory of unzipped hadoop-3.0.0-alpha2 folder and set the JAVA_HOME path in the hadoop-env.sh file

To get the path of JAVA_HOME in your machine, open your terminal and type $JAVA_HOME

In our case, the path is /usr/lib/jvm/java-8-oracle and the same we have set in the hadoop-env.sh file as shown in the below screenshot.

After setting the Java path, save and close the file.

Configuring core-site.xml file

Now open the core-site.xml file which is present in the etc/hadoop/ directory and set the below properties of your distributed file system.

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

Configuring hdfs-site.xml file

Open the hdfs-site.xml file in the same location and set the below property for replication.

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Here in hdfs-site.xml, for storing the metadata of namenode and datanode, you need to create two folders and you need to set the path of those folders here.

<property>
<name>dfs.namenode.name.dir</name>
<value>/home/kiran/Downloads/Hadoop/Hadoop3_data/NameNode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/kiran/Downloads/Hadoop/Hadoop3_data/DataNode</value>
</property>

We have created a folder called Hadoop3_data and inside we have created 2 directories with names NameNode & DataNode

The same you can see in the below screenshot.

Configuring ssh & pdsh

Install and setup ssh

If you are using Debian OS, install ssh with the below command

sudo apt-get install ssh

If you are using a Non-debian OS, install ssh with the below command

yum install openssh-server

After the installation, generate the ssh key with the below commands

Generate ssh key for hadoop user using the command:

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Copy the public key from .ssh directory to authorized_keys folder.
Change the directory to .ssh and then type the below command to copy the files
into the authorized _keys folder.
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
To ensure whether the keys have been copied, type the command:
cat authorized_keys
Change the permission of the .ssh directory.
chmod 600 ~/.ssh/authorized_keys

Install and setup pdsh

If you are using debian OS, install pdsh using the below command

sudo apt-get install pdsh

If you are using Non debian OS, install pdsh using the below commands

yum update
rpm -Uvh http://public-repo-1.hortonworks.com/ambari/centos6/1.x/GA/ambari-1.x-1.el6.noarch.rpm
yum install pdsh

After installing, copy the ssh file into pdsh. For copying, you need to enter root user

Enter the root user using the command sudo and then type the below command.

echo "ssh" > /etc/pdsh/rcmd_default

Now let’s configure YARN

Configuring mapred-site.xml file

Open mapred-site.xml file in etc/hadoop/ and set the below parameters. First you need to change the name of mapred-site.xml.template to mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.admin.user.env</name>
        <value>HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME</value>
    </property>
</configuration>

Configuring yarn-site.xml file

Open yarn-site.xml file etc/hadoop/ directory and set the below parameters

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

Configuring bashrc

After setting the properties in the Hadoop configuration files, save and close the files. Now open bashrc file which is in your home directory.

cd
gedit .bashrc

In the bashrc file, set the path of hadoop3 as shown below

export HADOOP_HOME=/home/kiran/Downloads/Hadoop/hadoop-3.0.0-alpha2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

After setting, save and close the file and now update the bashrc file using the command source .bashrc

NameNode Format

Hadoop

That’s it your Hadoop 3.x is ready. Let’s now format the name node.

Use the command ./hdfs namenode -format in the $HADOOP_HOME/bin directory

After successful format, you will get the message as shown in the below screen shot

We have successfully formatted the namenode, let’s now start the Hadoop daemons one by one. Move into the $HADOOP_HOME/sbin directory and type the below commands

Starting Hadoop daemons

Starting HDFS daemons

Starting name node

./hadoop-daemon.sh start namenode

Starting datanode

./hadoop-daemon.sh start datanode

Starting secondary namenode

./hadoop-daemon.sh start secondarynamenode

Starting YARN daemons

Starting Resource Manager

./yarn-daemon.sh start resourcemanager

Starting Node Manager

./yarn-daemon.sh start nodemanager

We have successfully started all the Hadoop daemons. You can check their status using the jps command.

You can also start all these commands using one command i.e., start-all.sh as shown in the below screenshot.

Exploring HDFS

In Hadoop 3.x, HDFS come up with some new features to deal with the files, you can do all kinds of storage operations from web UI itself. Let’s see how to do that.

In Hadoop 2.x, web UI port is 50070 but in Hadoop3.x, it is moved to 9870. You can access HDFS web UI from localhost:9870 as shown in the below screenshot

You can see all the HDFS configurations in this page, to access webHDFS, click on Utilities–>Browse the file system

You can see few options added in it i.e., Creation of New folder, Upload files, Cut and Paste files from one directory to another directory.

Before creating the folder, make sure that the user has correct permissions to perform operations on that directories. If not, you can change the permissions using the command

hadoop fs -chmod -R 777 /

Let’s create a new folder and upload some data into it. To create a new folder, click on the folder icon and give some name to the directory as shown below.

You can see that the folder has been created successfully in the below screenshot

To upload files, click on upload symbol and browse your file system to select the file that you need to upload.

You can also delete the files by clicking on the delete symbol beside the directory or file as shown in the below screenshot.

You can also cut and paste the files from one directory to another directory.

Select the files which you want to cut and click on cut option and then click on Ok as shown below.

Now move into the folder where ever you want to paste this file and just click on Paste option. After clicking on Paste, you can see that file has been pasted in that directory as shown in the below screenshot.

This is how you can perform operations on files using HDFS web UI in Hadoop 3.x.

We hope this blog helped you in understanding how to install Hadoop 3.x in a single node cluster and how to perform operation on HDFS files using HDFS web UI.

Enroll for Hadoop Training conducted by Acadgild and become a successful big data developer.

Related Popular Courses:

BIG DATA HADOOP

GOOGLE ANDROID CERTIFICATION PROGRAM

APACHE KAFKA

COURSES IN THE DATA SCIENCE SPECIALIZATION

BEST DATA ANALYTICS COURSES

Hadoop

3 Comments

  1. Hi, Mr.Kiran Krishna:
    May i know at mapred-site.xml need to be added two lines xml code {mapreduce.admin.user.env} and {yarn.app.mapreduce.am.evn}, and what theirs meaning?
    Because if without this two lines xml codes, then will pop out error with “could not find org.apache.hadoop.mapreduce.v2.app.MRAppMaster ” when running wordcount application on hadoop.
    Thanks.

  2. Thank you for sharing great article related to hadoop. i am intrested to read the hadoop related blog. thank you.

  3. Pingback: installing hadoop from scratch singlenode and cluster – A Cloud Database Blog – A DBDude Inc. Production

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close