Big Data Hadoop & Spark

Compiling and Running MapReduce Job from Command Line

Wondering how to run a MapReduce code in production? This blog will help you in migrating your working style from academia to production. We have taken the classic wordcount program and all the execution process will be accomplished using the command line.
I assume that reader has the basic understanding of MapReduce code.
Before we proceed, let us see the code.

Driver class:

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 public class drive
{
        @SuppressWarnings("deprecation")
        public static void main(String[] args) throws Exception
        {
                Job job = new Job();
                job.setJarByClass(drive.class);
                job.setOutputKeyClass(Text.class);
                job.setOutputValueClass(IntWritable.class);
 
                job.setMapperClass(Map.class);
                job.setReducerClass(Red.class);
                job.setInputFormatClass(TextInputFormat.class);
                job.setOutputFormatClass(TextOutputFormat.class);
 
                String inputPath = args[0];
                String outputPath = args[1];
                FileInputFormat.addInputPath(job, new Path(inputPath));
                FileOutputFormat.setOutputPath(job, new Path(outputPath));
                System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
}

Mapper Class:

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
 public class Map extends Mapper<LongWritable, Text, Text, IntWritable>
{
        private final static IntWritable one =new IntWritable(1);
        private Text word =new Text();
 
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
        {
                String words[] = value.toString().split(" ");
                for (int i=0; i<words.length; i++) {
                        word.set(words[i]);
                        context.write(word, one);
                }
 
        }
}

Reducer class:

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
 public class Red extends Reducer<Text, IntWritable, Text, IntWritable>
{
        public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException
        {
                int sum = 0;
                while(values.iterator().hasNext())
                {
                        sum += values.iterator().next().get();
                }
                context.write(key, new IntWritable(sum));
        }
}

I have written the code in local machine and transferred it to master node using WinSCP
Above three files are kept in a directory named ‘commandline’. You can create a directory of your choice and keep files in it.

Inside the directory ‘commandline’, create a sub-directory. Reason will be disclosed shortly.
mkdir executables
Now change the current working directory to executables.
cd executables

It’s time to compile your java code using below command.
NOTE: You need to change the path of jar files according to your environment.
Command:
javac -cp /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar:/usr/local/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar -d /home/acadgild/commandline/executables/ /home/acadgild/commandline/*.java
Notice the directory path after -d in the above command. It is the same which we had created a while ago (inside commandline directory).
This sub-directory will store all the compiled files(.class).
/home/acadgild/commandline/*.java refers to the path where your Mapper, Reducer, and Driver code resides.

It’s time to create the jar of from your class file(s).
Run below command from ‘executables’ directory.
jar -cvf Mycode.jar -C /home/acadgild/commandline/executables/ .
This will create a jar named Mycode.jar that will include all the class files present in executables sub-directory.

Now, you are good to run the Hadoop job using this jar.
I have an input file present in HDFS against which I’m running a MapReduce job that will count the occurrences of words.
Command: hadoop jar Mycode.jar /inp /out

That’s all!
You can check the output in the output directory that you have mentioned while firing the Hadoop command.

Hope this blog helped you in running your MapReduce job from CLI. For more updates, keep visiting www.acadgild.com

Tags

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close