Big Data Hadoop & Spark - Advanced

Building a Hadoop Application using Maven

In this post, we will be discussing how to build a Hadoop application using Maven. We recommend our readers to go through the previous post on Maven to get a clear idea of Maven and how it helps in building applications.

Eclipse needs to be installed in your system for this. Assuming that it is already installed, let’s create a Maven project.

Create a new Maven project in eclipse by following the steps below

Go to File –> New–>Other–>Maven project

Now, select the path of your Maven project as shown in the below screenshot.

 

Next, configure your project by providing the GroupId and the Artifact Id’s of your choice.

Now, a Maven project will be created and you can view the same in your package Explorer. In the SRC, you can also create your Java classes. You can create a class in the src/main/java as shown below.

You need to use the below code for performing Word Count program in Hadoop.

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }
  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();
    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }
  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Now, in the below the target folder, you can see a file called pom.xml. Here, you need to add the dependencies required for your project.

The dependencies for our project are as below:

<dependencies>
                <!-- JDK Tools -->
                <dependency>
                        <groupId>jdk.tools</groupId>
                        <artifactId>jdk.tools</artifactId>
                        <scope>system</scope>
                        <version>1.8.0_73</version>
                        <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
                </dependency>
                <!-- Hadoop Core -->
                <dependency>
                        <groupId>org.apache.hadoop</groupId>
                        <artifactId>hadoop-core</artifactId>
                        <version>1.2.1</version>
                </dependency>
                <!-- Hadoop Mapreduce Client Core -->
                <dependency>
                        <groupId>org.apache.hadoop</groupId>
                        <artifactId>hadoop-mapreduce-client-core</artifactId>
                        <version>2.7.1</version>
                </dependency>
        </dependencies>

Hadoop

Now, move into the home directory of your Maven project, using the terminal. After moving into the home directory, you can see the directory structure of your project as shown in the below screenshot.

Now, you need to give the command mvn clean install to build a jar file with all the packaging done by Maven. After the build, you will get a success message as shown in the below screenshot.

The jar file name is WordCount_Maven 0.0.1-SNAPSHOT.jar and it is created in the target directory as shown below.

That’s it! We have successfully built the Hadoop application built using Maven. Now we will run this jar using the hadoop jar command by providing the input and output paths. Here is our input data.

hi hi hello

hello good good

Now, run the jar file using the below Hadoop jar command.

hadoop jar <jar_file_path> args0<input file path> args1<Output file path>

The command for our project is as follows:

hadoop jar /home/kiran/workspace/hadoop_wordcount/target/hadoop_wordcount-0.0.1-SNAPSHOT.jar hadoop_wordcount.WordCount /inp /output_maven

This command will execute your jar file and create your output folder in the HDFS, and in the part-r-00000 file, you can see your output as shown in the below screenshot.

In the above screenshot, you can see the output of our Word Count program.

We hope this post has been helpful in building you first Hadoop application using Maven. In the case of any queries, feel free to comment below and we will get back to you at the earliest.

Keep visiting our site www.acadgild.com for more updates on Big Data and other technologies.

Hadoop

One Comment

  1. I have followed all the step you mentioned and snapshot is created successfully but i can’t find jar under maven-archiver Here is the below my tree structure. Looking forwards for the help.
    Thanks
    .
    ├── lib
    │   └── webarchive-commons-jar-with-dependencies.jar
    ├── LICENSE
    ├── pom.xml
    ├── README.md
    ├── src
    │   └── org
    │   └── commoncrawl
    │   ├── examples
    │   │   ├── mapreduce
    │   │   │   ├── ServerTypeMap.java
    │   │   │   ├── TagCounterMap.java
    │   │   │   ├── WARCTagCounter.java
    │   │   │   ├── WATServerType.java
    │   │   │   ├── WETWordCount.java
    │   │   │   └── WordCounterMap.java
    │   │   ├── S3ReaderTest.java
    │   │   └── WARCReaderTest.java
    │   └── warc
    │   ├── WARCFileInputFormat.java
    │   └── WARCFileRecordReader.java
    └── target
    ├── apidocs
    │   ├── allclasses-frame.html
    │   ├── allclasses-noframe.html
    │   ├── constant-values.html
    │   ├── deprecated-list.html
    │   ├── help-doc.html
    │   ├── index-all.html
    │   ├── index.html
    │   ├── org
    │   │   └── commoncrawl
    │   │   ├── examples
    │   │   │   ├── class-use
    │   │   │   │   ├── S3ReaderTest.html
    │   │   │   │   └── WARCReaderTest.html
    │   │   │   ├── mapreduce
    │   │   │   │   ├── class-use
    │   │   │   │   │   ├── ServerTypeMap.html
    │   │   │   │   │   ├── ServerTypeMap.MAPPERCOUNTER.html
    │   │   │   │   │   ├── ServerTypeMap.ServerMapper.html
    │   │   │   │   │   ├── TagCounterMap.html
    │   │   │   │   │   ├── TagCounterMap.MAPPERCOUNTER.html
    │   │   │   │   │   ├── TagCounterMap.TagCounterMapper.html
    │   │   │   │   │   ├── WARCTagCounter.html
    │   │   │   │   │   ├── WATServerType.html
    │   │   │   │   │   ├── WETWordCount.html
    │   │   │   │   │   ├── WordCounterMap.html
    │   │   │   │   │   ├── WordCounterMap.MAPPERCOUNTER.html
    │   │   │   │   │   └── WordCounterMap.WordCountMapper.html
    │   │   │   │   ├── package-frame.html
    │   │   │   │   ├── package-summary.html
    │   │   │   │   ├── package-tree.html
    │   │   │   │   ├── package-use.html
    │   │   │   │   ├── ServerTypeMap.html
    │   │   │   │   ├── ServerTypeMap.MAPPERCOUNTER.html
    │   │   │   │   ├── ServerTypeMap.ServerMapper.html
    │   │   │   │   ├── TagCounterMap.html
    │   │   │   │   ├── TagCounterMap.MAPPERCOUNTER.html
    │   │   │   │   ├── TagCounterMap.TagCounterMapper.html
    │   │   │   │   ├── WARCTagCounter.html
    │   │   │   │   ├── WATServerType.html
    │   │   │   │   ├── WETWordCount.html
    │   │   │   │   ├── WordCounterMap.html
    │   │   │   │   ├── WordCounterMap.MAPPERCOUNTER.html
    │   │   │   │   └── WordCounterMap.WordCountMapper.html
    │   │   │   ├── package-frame.html
    │   │   │   ├── package-summary.html
    │   │   │   ├── package-tree.html
    │   │   │   ├── package-use.html
    │   │   │   ├── S3ReaderTest.html
    │   │   │   └── WARCReaderTest.html
    │   │   └── warc
    │   │   ├── class-use
    │   │   │   ├── WARCFileInputFormat.html
    │   │   │   └── WARCFileRecordReader.html
    │   │   ├── package-frame.html
    │   │   ├── package-summary.html
    │   │   ├── package-tree.html
    │   │   ├── package-use.html
    │   │   ├── WARCFileInputFormat.html
    │   │   └── WARCFileRecordReader.html
    │   ├── overview-frame.html
    │   ├── overview-summary.html
    │   ├── overview-tree.html
    │   ├── package-list
    │   ├── script.js
    │   └── stylesheet.css
    ├── archive-tmp
    ├── cc-warc-examples-0.1-SNAPSHOT.jar
    ├── cc-warc-examples-0.1-SNAPSHOT-jar-with-dependencies.jar
    ├── cc-warc-examples-0.1-SNAPSHOT-javadoc.jar
    ├── cc-warc-examples-0.1-SNAPSHOT-sources.jar
    ├── classes
    │   └── org
    │   └── commoncrawl
    │   ├── examples
    │   │   ├── mapreduce
    │   │   │   ├── ServerTypeMap.class
    │   │   │   ├── ServerTypeMap$MAPPERCOUNTER.class
    │   │   │   ├── ServerTypeMap$ServerMapper.class
    │   │   │   ├── TagCounterMap.class
    │   │   │   ├── TagCounterMap$MAPPERCOUNTER.class
    │   │   │   ├── TagCounterMap$TagCounterMapper.class
    │   │   │   ├── WARCTagCounter.class
    │   │   │   ├── WATServerType.class
    │   │   │   ├── WETWordCount.class
    │   │   │   ├── WordCounterMap.class
    │   │   │   ├── WordCounterMap$MAPPERCOUNTER.class
    │   │   │   └── WordCounterMap$WordCountMapper.class
    │   │   ├── S3ReaderTest.class
    │   │   └── WARCReaderTest.class
    │   └── warc
    │   ├── WARCFileInputFormat.class
    │   └── WARCFileRecordReader.class
    ├── javadoc-bundle-options
    │   └── javadoc-options-javadoc-resources.xml
    └── maven-archiver
    └── pom.properties

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close