Big Data Hadoop & Spark - Advanced

Scheduling Hadoop Jobs Using Jenkins

A scheduler makes our work easier in automating a certain process. Jenkins is one among them. Jenkins is an open source automation server written in Java. Jenkins is a Continuous Integration server. Continuous Integration is running your tests on a non-developer or a remote machine automatically every time someone pushes new code into the source repository.

In this blog, we will be discussing how to schedule Hadoop jobs using Jenkins. Earlier we have seen how to schedule and run Hadoop jobs using Rundeck. Here we will see how to schedule a Hadoop job using Jenkins.

Let’s see how to install Jenkins. Download Jenkins war file from the below link

https://jenkins.io/download/

After downloading the jar file, run it using the command

java -jar Jenkins.war

At the last you will see the password of admin user, you need to copy that to access the CLI of Jenkins.

By default, Jenkins uses the port 8080. Open your browser and type localhost:8080 you will be prompted to enter the password here enter the password which was shown in your terminal.

After clicking on Continue, you will be asked to install few plugins, here you can choose Install suggested plugins. Then wait till installation of all the plugins complete.

After this, you will be asked to create a user as shown below.

To create a job, click on Create new jobs.

Here enter the job name and select the project type as Multi Configurations Project as shown below.

Hadoop

 

Now you will get the configurations page of your project as shown below.

Here you can see lots of configuration options. You can select them based on your project requirement.

For submitting Hadoop commands, move to Build and here select the option Execute shell as shown below.

Here provide the Hadoop jar command that you will give for running a normal Hadoop jar with all the input and output file paths.

Also if you want to view the output, you can also give the shell command to view the output. This command will be run after the completion of the first command.

Now move into the dashboard page. Now you can see the project here.

Note: To run your Hadoop jobs at a particular time or to run the job automatically at a particular time, you need to provide your parameters in the Build triggers section. Here we are running the job normally.

Now click on schedule now on the right side to run the project. Your project will start running, to see the console output Click on Build History you will be prompted to a page where all your job statuses preset.

Now click on the Terminal symbol of your job to see the console output of your job as shown below.

Below you can see the console output of your 2nd command also.

In the above screenshot, you can see the output file folder contents. This is how you can run a Hadoop job using Jenkins.

We hope this blog helped you in running Hadoop jobs using Jenkins. Keep visiting our site www.acadgild.com for more updates on Big Data and other technologies.

Hadoop

Tags

One Comment

  1. Good Info… how do we deal with Authentication when pushing jobs via Jenkins. Especially Kerberos secured Hadoop. Is there an option to configure Keytab in Jenkins. OR any other options?

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles

Close