Big Data Hadoop & Spark

Beginner’s Guide For Oozie Installation

In this blog we will be discussing about how to install oozie in hadoop 2.x cluster.

First we need to download the oozie-4.1.0 tar file from the below link:

100% Free Course On Big Data Essentials

Subscribe to our blog and get access to this course ABSOLUTELY FREE.

Oozie-4.1.0 tar file

By default it will be downloaded in the Downloads folder.

We need to move into the Downloads folder using the below commands:

cd

cd Downloads

We need to extract the tar file using the below command:

tar -xzvf oozie-4.1.0.tar.gz

The tar file will be extracted and you will get oozie-4.1.0 file

Maven Installation

Before setting up the things for oozie install maven in your system as by using maven, oozie download the dependencies required for your hadoop cluster based on the hadoop’s version.

If you are using Centos type the below command to install maven:

Command:yum install maven

If you are using Ubuntu type the below command to install maven:

Command:sudo apt-get install maven

After the installation of maven check the installed maven by using the below command

mvn -version

You must get the output as shown in the below screen shot

Oozie distro creation

Now open the untared oozie-4.1.0 file and open the pom.xml file

In the pom.xml file update the target version of java as your java version. Here we are using Java7. So we have updated the target version as 1.7

If you are using Hadoop 2.x update the hadoop version as 2.3 so that by using maven, oozie will refer the dependencies that are required to run it on hadoop 2.x cluster, hadoop 2.3 dependencies are the latest one which oozie has added.

Now comment the codehaus repository, because codehaus has stopped its services recently. So dependencies won’t be downloaded from this repository.

After making the above specified changes, save and close the file.

Now move into the untared oozie-4.1.0 bin folder

and then type the below command:

./mkdistro.sh -DskipTests -X

The above command will run the disto, and prepares a distro file by skipping the Tests by Debugging

Note: distro command will download the dependencies from maven that are required for hadoop2.x cluster that required for oozie.

The process will take some time, it will download all the depedencies required for your project.

While making the distro file the you will get some dots as shown below, don’t panic at that time.

Finally you will get a success message as shown in the below figure.

A target file will be created in the distro folder of your oozie directory.

Now open the file target file inside distro folder

Inside the targer folder you can see the oozie-4.1.0-distro folder

Open the oozie-4.1.0-distro folder, inside you will find oozie-4.1.0 folder

This is the oozie-4.1.0 folder which consists of all the dependencies that are required to run in a hadoop cluster.

Copy this oozie-4.1.0 folder into your hadoop user, in our case we are making a oozie directory in home folder($HOME) and then paste the obtained oozie-4.1.0 folder in the path $HOME/oozie

Now change the path to newly obtained oozie-4.1.0 directory, create a directory with name libext(library extension) using the command mkdir libext.

In the below screenshot we can see that libext directory has been created in the path $HOME/oozie/oozie-4-1.0

Move into the libext directory using the command cd libext

Hadoop

Now copy the jar files of Hadoop-2.3.0 into the newly created libext folder. You can find the libraries of Hadoop-2.3.0 in the following path.

oozie-4.1.0–>hadooplibs–>hadoop-2–>target–>hadooplibs–>hadooplib-2.3.0.oozie-4.1.0–>

Please refer the below screen shot for the same.

Copy the jar files inside hadooplib-2.3.0.oozie-4.1.0 to the newly created libext folder

Now download the the ext-2.2 zip file from the below link

ext-2.2.zip

Copy this downloaded ext-2.2.zip file into the newly created libext folder

This ext-2.2.zip file is required for WebUI.

Refer the below screen shot to see the presence of hadooplib-2.3.0.oozie-4.1.0 jar files and ext-2.2.zip file inside the libext folder.

Now after setting up the things, move into the bin folder of newly obtained oozie-4.1.0 in the path $HOME/oozie/oozie-4.1.0/

oozie-4.1.0/bin

Hadoop

Preparing a War file

Now prepare a war file by using the below command

sudo ./oozie-setup.sh prepare-war

The above command will prepare a war file for oozie.

After the successfull preparation of war file, you will get the output as shown in the below image.

Now, open the core-site.xml file in your hadoop’s etc folder and add the below properties.

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop_user_name.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop_user_name..groups</name>
<value>*</value>
</property>

After doing the changes, save and close the file.

Now open the oozie-site.xml file present in the newly obtained oozie-4.1.0’s conf directory.

In the oozie-site.xml file edit the below specified properties

In the oozie.service.HadoopAccessorService.hadoop.configurations, specify your hadoop configurations directory path.Please refer the below for the same

<property>
        <name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
        <value>*=/home/kiran/hadoop-2.7.1/etc/hadoop</value>
        <description>
            Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of
            the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is
            used when there is no exact match for an authority. The HADOOP_CONF_DIR contains
            the relevant Hadoop *-site.xml files. If the path is relative is looked within
            the Oozie configuration directory; though the path can be absolute (i.e. to point
            to Hadoop client conf/ directories in the local filesystem.
        </description>
    </property>

In the oozie.service.workflowAppservice.system.libpath, give your Namenode port number.please refer the below for the same.

 <property>
        <name>oozie.service.WorkflowAppService.system.libpath</name>
        <value>hdfs://localhost:9000/user/${user.name}/share/lib</value>
        <description>
            System library path to use for workflow applications.
            This path is added to workflow application if their job properties sets
            the property 'oozie.use.system.libpath' to true.
        </description>
    </property>

Now give the ownership permission to the oozie folder by using the below command

sudo chown hadoop's_user_name oozie_file_path(in our case it is $HOME/oozie)

Creating Sharelib directory in HDFS

Note: Make sure that all your hadoop daemons are started properly.

Move into the bin folder of newly created oozie-4.1.0.

Now create a file in hdfs for storing the oozie contents with name sharelib using the below command:

./oozie-setup.sh sharelib create -fs hdfs://localhost:9000

The above command will create a folder with name sharelib in HDFS.

You will get a message as follows:

the destination path for sharelib is: hdfs://localhost:9000/user/kiran/share/lib/

Creating Oozie DB

Before creating a oozie DB make sure that you have installed Mysql-server in your system.

If you haven’t installed mysql, install it by using the command

Command to install mysql_server in Centos

sudo yum install mysql-server

Command to install mysql_server in Ubuntu

sudo apt-get install mysql-server

After the installation of MYSQL server, move into the newly created oozie-4.1.0’s bin folder and then type the below command

./ooziedb.sh create -sqlfile oozie.sql -run

After running this command successfully, you will get the below output

setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
Validate DB Connection
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
Create OOZIE_SYS table
DONE
Oozie DB has been created for Oozie version '4.1.0'
The SQL commands have been written to: oozie.sql

With this step, your oozie installation is completed.
Now export the newly created oozie’s bin path into your .bashrc file from your home folder by using the below command
gedit .bashrc

After the editing of bashrc file, save the file and close the file, now update the bashrc file by using the below command
source .bashrc
Now your oozie is successfully configured with your hadoop cluster. Now start oozie by using the command

 oozied.sh start


Now your oozie is successfully started, you can also check the same with the webUI.
Open your browser, and then type localhost:11000, 11000 is the default port for oozie.
All the Active and suspended jobs can be seen in the web UI.

We have successfully installed Oozie-4.1.0 on hadoop 2.x cluster.
Hope this blog helped you in installing oozie in your hadoop cluster, Keep visiting our website Acadgild for more updates on Big Data and other technologies. Click here to learn Big Data Hadoop Development.

Hadoop

4 Comments

  1. i have follow your oozie4.1.0 installation document.
    it is working fine.
    in example mapreduce job is working ,but pig is not working
    it gives the error lanucher error ,pig mainexit code[2]
    can you please help this error

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close