Free Shipping

Secure Payment

easy returns

24/7 support

  • Home
  • Blog
  • Integrating Apache Tez with Hadoop

Integrating Apache Tez with Hadoop

 July 13  | 0 Comments

In this post, we will be discussing how to integrate Tez with Hadoop. We will also see how to run a Hadoop job using Tez engine.

Let’s begin with the basics first.

To learn more on Hadoop visit here

What is Tez?

Tez is a new application framework built on Hadoop Yarn, which can execute complex-directed acyclic graphs of general data processing tasks. In many ways, it can be considered to be a much more flexible and powerful successor of the map-reduce framework.

Tez provides developers an API framework to write native YARN applications on Hadoop that bridges the spectrum of interactive and batch workloads. It allows those data access applications to work with petabytes of data over thousands of nodes.

In simple terms, Tez is a processing engine on the top of Hadoop ecosystem, which runs on YARN and performs well within mixed workload clusters.

In the above image, we can see that Tez is placed on the top of YARN as an execution engine. Tez will manage the workloads in the cluster and completes the job in minimum time.

Now, let’s look at how to install Tez on your cluster and integrate Tez with Hadoop. Before installing Tez we have to make sure that Java and Maven are installed in your system.

Maven Installation

Step 1: Before setting up the things for Tez, let’s install maven in the system, as Tez uses Maven to download the dependencies required for the Hadoop cluster, based on the Hadoop’s version.

If using Centos, type the below command to install maven:

Command:yum install maven

If you are using Ubuntu, type the below command to install maven:

Command:sudo apt-get install maven

Step 2: After installing Maven, check it by using the below command:

mvn -version

You must get the output as shown in the below screen shot:

After installing Maven, install Protocol buffer 2.5.0 or higher versions of the same. You can download protocol buffer-2.5.0 from the following link:

Protocol buffer-2.5.0

By default, it will downloaded into your ‘Downloads’ folder.

Step 4: Now, untar the file using the below command:

tar -xvf protobuf-2.5.0.tar.gz

Now, you can view the extracted file in the ‘Downloads’ folder itself.

Step 5: Next, let’s move the protobuf-2.5.0 folder into your desired location. Here, we are moving it into the home directory.

mv protobuf-2.5.0 $HOME/

Step 6: Now, open the protobuf-2.5.0 folder using the command cd protobuf-2.5.0

Step 7: Now type the below commands to configure protocol buffer.

./autogen.sh
./configure --prefix=/usr

 

Step 8: Execute the make command

make

Once configure has done its job, we can invoke make to build the software. This runs a series of tasks defined in a Makefile to build the finished program from its source code.

Step 9: Type the make install command

make install

Now that the software is built and ready to run, the files can be copied to their final destinations. The make install command will copy the built program, and its libraries and documentation, to the correct location

It will take some time for the process to be completed.

Step 10: Now, check if the protocol buffer is installed or not with the below command:

 protoc --version

We have successfully installed the protocol buffer! Let’s install Apache Tez now.

Step 11: You can download Apache Tez from the following link:

Apache Tez-0.8.1

By default, it will downloaded into your ‘Downloads’ folder. You can move the folder into your desired location. Here, we are moving it into the ‘Home’ directory.

Step 12: Now, untar the file using the below command:

tar -xvf apache-tez-0.8.1-alpha-src.tar.gz

 

Step 13: Now, move it into the extracted Tez folder.

 

Step 14: You can see a file by name pom.xml. Open the file and change the version of Hadoop, Pig and Java.

We have updated the versions as shown in the below image.

 

Step 15: Now save and close the file. Open the extracted Tez folder and then type the below command:

mvn clean package -DskipTests

The above command will download all the repositories that are required for Tez to be compatible with your Hadoop version. .After the entire process, you will be able to see the ‘Build Success’ message.

Step 16: Now, in the target directory of tez-dist you can see two tar balls are created:

tez-0.8.1-alpha.tar.gz

tez-0.8.1-alpha-minimal.tar.gz

Next, create a new folder with the name tez in your desired location. We are creating this folder in the home directory and then moving the two tar balls into the folder.

Step 17: Create the directory using the command mkdir tez


Step 18: Now, create a folder with name tez in your HDFS, using the below command:

hadoop fs -mkdir /tez

Step 17: Copy the tez-0.8.1-alpha.tar.gz tar ball into the directory using the below command:

hadoop fs -put tez-0.8.1-alpha.tar.gz /tez/

You can check whether the file has copied to HDFS or not by using the below command:

hadoop fs -ls /tez/

Step 18: Now, within the tez folder, which is in your local file system, create two folders with name conf and tez using the below commands:

mkdir conf
mkdir tez

Step 19: Extract the contents of the file tez-0.8.1-alpha-minimal.tar.gz to the newly created tez folder.

Step 20: Now, create a file tez-site.xml in your newly created conf directory and add the below configurations in the tez-site.xml file:

<configuration>
<property>
<name>tez.lib.uris</name>
<value>hdfs://localhost:9000/tez/tez-0.8.1-alpha.tar.gz</value>
</property>
</configuration>

 


Note: In the value tags, you need to provide the location of the tar ball which is copied into your HDFS.


Now close and save the file. Your Tez is ready! You can now integrate it with Hadoop by adding the following properties in your hadoop-env.sh file of Hadoop.

export TEZ_CONF_DIR=/home/kiran/tez/conf/
export TEZ_JARS=/home/kiran/tez/tez/
export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*:${HADOOP_CLASSPATH}:
${JAVA_JDBC_LIBS}:${MAPREDUCE_LIBS}

Note: You need to provide the conf directory folder path in the CONF_DIR and in the TEZ_JARS you need to give the untared tez-0.8.1-alpha-minimal.tar.gz folder.


Now close and save the file. Your Tez is integrated with Hadoop.

To check this, run a sample Pig program on Tez. The below is a sample Pig script which will perform Twitter Sentiment Analysis.

The above file is stored as twitter.pig and the output will be stored in /twee_out/2/ of HDFS. Now let’s run the program with Tez engine using the below command:

pig -x tex twitter.pig

After the process, you can see a success message which says that the job has been successfully run using Tez.

In the HDFS you can see that the part file has been created successfully.

From the above screen shot, we can say that Tez job has been run successfully with Hadoop.

Hope you found this post helpful. Keep visiting our website Acadgild for more updates on Big Data and other technologies. Click here to learn Big Data Hadoop Development.

 

>