Big Data Hadoop & Spark

Integrating Apache Tez with Hadoop

In this post, we will be discussing how to integrate Tez with Hadoop. We will also see how to run a Hadoop job using Tez engine.

Let’s begin with the basics first.

100% Free Course On Big Data Essentials

Subscribe to our blog and get access to this course ABSOLUTELY FREE.

To learn more on Hadoop visit here

What is Tez?

Tez is a new application framework built on Hadoop Yarn, which can execute complex-directed acyclic graphs of general data processing tasks. In many ways, it can be considered to be a much more flexible and powerful successor of the map-reduce framework.

Tez provides developers an API framework to write native YARN applications on Hadoop that bridges the spectrum of interactive and batch workloads. It allows those data access applications to work with petabytes of data over thousands of nodes.

In simple terms, Tez is a processing engine on the top of Hadoop ecosystem, which runs on YARN and performs well within mixed workload clusters.

In the above image, we can see that Tez is placed on the top of YARN as an execution engine. Tez will manage the workloads in the cluster and completes the job in minimum time.

Now, let’s look at how to install Tez on your cluster and integrate Tez with Hadoop. Before installing Tez we have to make sure that Java and Maven are installed in your system.

Maven Installation

Step 1: Before setting up the things for Tez, let’s install maven in the system, as Tez uses Maven to download the dependencies required for the Hadoop cluster, based on the Hadoop’s version.

If using Centos, type the below command to install maven:

Command:yum install maven

If you are using Ubuntu, type the below command to install maven:

Command:sudo apt-get install maven

Step 2: After installing Maven, check it by using the below command:

mvn -version

You must get the output as shown in the below screen shot:

After installing Maven, install Protocol buffer 2.5.0 or higher versions of the same. You can download protocol buffer-2.5.0 from the following link:

Protocol buffer-2.5.0

By default, it will downloaded into your ‘Downloads’ folder.

Step 4: Now, untar the file using the below command:

tar -xvf protobuf-2.5.0.tar.gz

Now, you can view the extracted file in the ‘Downloads’ folder itself.

Step 5: Next, let’s move the protobuf-2.5.0 folder into your desired location. Here, we are moving it into the home directory.

mv protobuf-2.5.0 $HOME/

Step 6: Now, open the protobuf-2.5.0 folder using the command cd protobuf-2.5.0

Step 7: Now type the below commands to configure protocol buffer.

./autogen.sh
./configure --prefix=/usr

Step 8: Execute the make command

make

Once configure has done its job, we can invoke make to build the software. This runs a series of tasks defined in a Makefile to build the finished program from its source code.

Step 9: Type the make install command

make install

Now that the software is built and ready to run, the files can be copied to their final destinations. The make install command will copy the built program, and its libraries and documentation, to the correct location

It will take some time for the process to be completed.

Step 10: Now, check if the protocol buffer is installed or not with the below command:

 protoc --version

We have successfully installed the protocol buffer! Let’s install Apache Tez now.

Step 11: You can download Apache Tez from the following link:

Apache Tez-0.8.1

By default, it will downloaded into your ‘Downloads’ folder. You can move the folder into your desired location. Here, we are moving it into the ‘Home’ directory.

Step 12: Now, untar the file using the below command:

tar -xvf apache-tez-0.8.1-alpha-src.tar.gz

Step 13: Now, move it into the extracted Tez folder.

Hadoop

Step 14: You can see a file by name pom.xml. Open the file and change the version of Hadoop, Pig and Java.

We have updated the versions as shown in the below image.

Hadoop

Step 15: Now save and close the file. Open the extracted Tez folder and then type the below command:

mvn clean package -DskipTests

The above command will download all the repositories that are required for Tez to be compatible with your Hadoop version. .After the entire process, you will be able to see the ‘Build Success’ message.

Step 16: Now, in the target directory of tez-dist you can see two tar balls are created:

tez-0.8.1-alpha.tar.gz

tez-0.8.1-alpha-minimal.tar.gz

Next, create a new folder with the name tez in your desired location. We are creating this folder in the home directory and then moving the two tar balls into the folder.

Step 17: Create the directory using the command mkdir tez


Step 18: Now, create a folder with name tez in your HDFS, using the below command:

hadoop fs -mkdir /tez

Step 17: Copy the tez-0.8.1-alpha.tar.gz tar ball into the directory using the below command:

hadoop fs -put tez-0.8.1-alpha.tar.gz /tez/

You can check whether the file has copied to HDFS or not by using the below command:

hadoop fs -ls /tez/

Step 18: Now, within the tez folder, which is in your local file system, create two folders with name conf and tez using the below commands:

mkdir conf
mkdir tez

Step 19: Extract the contents of the file tez-0.8.1-alpha-minimal.tar.gz to the newly created tez folder.

Step 20: Now, create a file tez-site.xml in your newly created conf directory and add the below configurations in the tez-site.xml file:

<configuration>
<property>
<name>tez.lib.uris</name>
<value>hdfs://localhost:9000/tez/tez-0.8.1-alpha.tar.gz</value>
</property>
</configuration>


Note: In the value tags, you need to provide the location of the tar ball which is copied into your HDFS.


Now close and save the file. Your Tez is ready! You can now integrate it with Hadoop by adding the following properties in your hadoop-env.sh file of Hadoop.

export TEZ_CONF_DIR=/home/kiran/tez/conf/
export TEZ_JARS=/home/kiran/tez/tez/
export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*:${HADOOP_CLASSPATH}:
${JAVA_JDBC_LIBS}:${MAPREDUCE_LIBS}

Note: You need to provide the conf directory folder path in the CONF_DIR and in the TEZ_JARS you need to give the untared tez-0.8.1-alpha-minimal.tar.gz folder.


Now close and save the file. Your Tez is integrated with Hadoop.

To check this, run a sample Pig program on Tez. The below is a sample Pig script which will perform Twitter Sentiment Analysis.

The above file is stored as twitter.pig and the output will be stored in /twee_out/2/ of HDFS. Now let’s run the program with Tez engine using the below command:

pig -x tex twitter.pig

After the process, you can see a success message which says that the job has been successfully run using Tez.

In the HDFS you can see that the part file has been created successfully.

From the above screen shot, we can say that Tez job has been run successfully with Hadoop.

Hope you found this post helpful. Keep visiting our website Acadgild for more updates on Big Data and other technologies. Click here to learn Big Data Hadoop Development.

Hadoop

7 Comments

  1. Hi, i got the following error while following steps,
    [INFO] Running ‘npm install –color=false’ in /home/vasanth/Desktop/software/apache-tez-0.8.1-alpha-src/tez-ui/src/main/webapp
    [INFO]
    [INFO] — exec-maven-plugin:1.3.2:exec (Bower install) @ tez-ui —
    bower FileSaver.js#24b303f49213b905ec9062b708f7cd43d56a5dde ENOGIT git is not installed or not in the PATH
    [INFO] ————————————————————————
    [INFO] Reactor Summary:
    [INFO]
    [INFO] tez ………………………………………… SUCCESS [ 1.591 s]
    [INFO] tez-api …………………………………….. SUCCESS [ 10.188 s]
    [INFO] tez-common ………………………………….. SUCCESS [ 0.688 s]
    [INFO] tez-runtime-internals ………………………… SUCCESS [ 1.209 s]
    [INFO] tez-runtime-library ………………………….. SUCCESS [ 3.258 s]
    [INFO] tez-mapreduce ……………………………….. SUCCESS [ 1.814 s]
    [INFO] tez-examples ………………………………… SUCCESS [ 0.343 s]
    [INFO] tez-dag …………………………………….. SUCCESS [ 6.750 s]
    [INFO] tez-tests …………………………………… SUCCESS [ 1.443 s]
    [INFO] tez-ext-service-tests ………………………… SUCCESS [ 1.386 s]
    [INFO] tez-ui ……………………………………… FAILURE [ 8.437 s]
    [INFO] tez-plugins …………………………………. SKIPPED
    [INFO] tez-yarn-timeline-history …………………….. SKIPPED
    [INFO] tez-yarn-timeline-history-with-acls ……………. SKIPPED
    [INFO] tez-history-parser …………………………… SKIPPED
    [INFO] tez-tools …………………………………… SKIPPED
    [INFO] tez-perf-analyzer ……………………………. SKIPPED
    [INFO] tez-job-analyzer …………………………….. SKIPPED
    [INFO] tez-javadoc-tools ……………………………. SKIPPED
    [INFO] tez-dist ……………………………………. SKIPPED
    [INFO] Tez ………………………………………… SKIPPED
    [INFO] ————————————————————————
    [INFO] BUILD FAILURE
    [INFO] ————————————————————————
    [INFO] Total time: 37.771 s
    [INFO] Finished at: 2016-06-07T19:41:06+05:30
    [INFO] Final Memory: 68M/418M
    [INFO] ————————————————————————
    [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.2:exec (Bower install) on project tez-ui: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1]
    org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.2:exec (Bower install) on project tez-ui: Command execution failed.
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
    at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
    at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
    at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
    at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
    Caused by: org.apache.maven.plugin.MojoExecutionException: Command execution failed.
    at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:303)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
    … 20 more
    Caused by: org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
    at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:402)
    at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:164)
    at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:746)
    at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:292)
    … 22 more
    [ERROR]
    [ERROR] Re-run Maven using the -X switch to enable full debug logging.
    [ERROR]
    [ERROR] For more information about the errors and possible solutions, please read the following articles:
    [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
    [ERROR]
    [ERROR] After correcting the problems, you can resume the build with the command
    [ERROR] mvn -rf :tez-ui
    any suggestions to problem?

  2. Hi i got the following wrror,while following steps:
    [INFO] Running ‘npm install –color=false’ in /home/vasanth/Desktop/software/apache-tez-0.8.1-alpha-src/tez-ui/src/main/webapp
    [INFO]
    [INFO] — exec-maven-plugin:1.3.2:exec (Bower install) @ tez-ui —
    bower FileSaver.js#24b303f49213b905ec9062b708f7cd43d56a5dde ENOGIT git is not installed or not in the PATH
    [INFO] ————————————————————————
    [INFO] Reactor Summary:
    [INFO]
    [INFO] tez ………………………………………… SUCCESS [ 1.591 s]
    [INFO] tez-api …………………………………….. SUCCESS [ 10.188 s]
    [INFO] tez-common ………………………………….. SUCCESS [ 0.688 s]
    [INFO] tez-runtime-internals ………………………… SUCCESS [ 1.209 s]
    [INFO] tez-runtime-library ………………………….. SUCCESS [ 3.258 s]
    [INFO] tez-mapreduce ……………………………….. SUCCESS [ 1.814 s]
    [INFO] tez-examples ………………………………… SUCCESS [ 0.343 s]
    [INFO] tez-dag …………………………………….. SUCCESS [ 6.750 s]
    [INFO] tez-tests …………………………………… SUCCESS [ 1.443 s]
    [INFO] tez-ext-service-tests ………………………… SUCCESS [ 1.386 s]
    [INFO] tez-ui ……………………………………… FAILURE [ 8.437 s]
    [INFO] tez-plugins …………………………………. SKIPPED
    [INFO] tez-yarn-timeline-history …………………….. SKIPPED
    [INFO] tez-yarn-timeline-history-with-acls ……………. SKIPPED
    [INFO] tez-history-parser …………………………… SKIPPED
    [INFO] tez-tools …………………………………… SKIPPED
    [INFO] tez-perf-analyzer ……………………………. SKIPPED
    [INFO] tez-job-analyzer …………………………….. SKIPPED
    [INFO] tez-javadoc-tools ……………………………. SKIPPED
    [INFO] tez-dist ……………………………………. SKIPPED
    [INFO] Tez ………………………………………… SKIPPED
    [INFO] ————————————————————————
    [INFO] BUILD FAILURE
    [INFO] ————————————————————————
    [INFO] Total time: 37.771 s
    [INFO] Finished at: 2016-06-07T19:41:06+05:30
    [INFO] Final Memory: 68M/418M
    [INFO] ————————————————————————
    [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.2:exec (Bower install) on project tez-ui: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1]
    org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.2:exec (Bower install) on project tez-ui: Command execution failed.
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
    at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
    at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
    at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
    at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
    Caused by: org.apache.maven.plugin.MojoExecutionException: Command execution failed.
    at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:303)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
    … 20 more
    Caused by: org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
    at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:402)
    at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:164)
    at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:746)
    at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:292)
    … 22 more
    [ERROR]
    [ERROR] Re-run Maven using the -X switch to enable full debug logging.
    [ERROR]
    [ERROR] For more information about the errors and possible solutions, please read the following articles:
    [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
    [ERROR]
    [ERROR] After correcting the problems, you can resume the build with the command
    [ERROR] mvn -rf :tez-ui

  3. Before typing ./autogen.sh
    make sure you install libtool
    sudo apt-get install libtool
    this will remove autoreconf not found error .

  4. how to configure tez for multi-node cluster I followed the steps mentioned above and it is working perfectly with apache pig

  5. [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.2:exec (Bower install) on project tez-ui: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1]
    Reason: You are running as root or with superuser permissions.
    Solutions:
    Recommended: Run without superuser permission, or as a non-root user.
    Alternate: If you want to continue as root, add –allow-root to arguments tag of exec-maven-plugin in tez-ui/pom.xml.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close