Big Data Hadoop & Spark

Running Hadoop Application Locally in Windows

Running Hadoop Application

Before proceeding ahead we recommend users to learn to run hadoop in linux by downloding the document for running hadoop in single node cluster. Let us learn running Hadoop application locally in Windows. Here we will be running a Hadoop Mapreduce word count program in Windows. For doing this, you need to download and extract Hadoop tar file.

In this post, we have used Hadoop-2.6.0 version. You can use the later versions as well.

You can download Hadoop-2.6.0 tar file from the below link

https://drive.google.com/open?id=0B1QaXx7tpw3SQUw5QkpYNTN2UGc

After downloading, extract the tar file. Now you will be able to see a folder called hadoop-2.6.0 in the extracted directory.

Let’s quickly run a program.

Open your Eclipse and create a new Java program.

Hereafter clicking on the New Java project, it will ask for the project name as shown in the below screen shot. Give a project name. Here we have given the project name as Word_count.

Now after giving the project name, a project will be created with the given name. Click on the project and inside the project, you will find a directory called src. Right click and create new class as shown in the below screen shot.

Now you will be prompted with another screen to provide the class name as shown in the below screen shot.

Here, give the class name of your choice. We have given the name as WordCount. Inside the src, a file with name WordCount.java has been created. Click on the file and write the MapReduce code for the word count program.

Hadoop

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }
  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();
    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }
  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

After copying the code save the file. Now you need to add a few dependency files for running this program in Windows.

First, we need to add the jar files that are present in hadoop-2.6.0/share/hadoop directory. For that Righ click on src–>Build path–>Configure build path as shown in the below screen shot.

In the Build Path select the Libraries tab and click on Add External Jars.

Now browse the path where the Hadoop-2.6.0 extracted folder is present.

Here go to hadoop-2.6.0/share/hadoop/common folder and then add the hadoop-common-2.6.0.jar file

And then open the lib folder and here add the

Commons-collections-3.2.1.jar

Commons-configuration-1.6.jar

Commons-lang-2.6.jar

Commons-logging-1.1.3.jar

guava-11.0.2.jar

Jackson-core-asl-1.9.13.jar

jackson-jaxrs-1.9.13.jar

jackson-mapper-asl-1.9.13.jar

log4j-1.2.17.jar files

Open the hadoop-2.6.0/share/hadoop/mapreduce folder and add the below specified jar files

hadoop-mapreduce-client-common-2.6.0.jar

hadoop-mapreduce-client-core-2.6.0.jar

hadoop-mapreduce-client-jobclient-2.6.0.jar

hadoop-mapreduce-client-shuffle-2.6.0.jar

Open the hadoop-2.6.0/share/hadoop/yarn folder and add the below specified jar files

hadoop-yarn-api-2.6.0.jar

hadoop-yarn-client-2.6.0.jar

hadoop-yarn-common-2.6.0.jar

Open the hadoop-2.6.0/share/hadoop/hdfs/lib folder and add the commons-io-2.4.jar file

Open the hadoop-2.6.0/share/hadoop/tools/lib and add the hadoop-auth-2.6.0.jar file

You need to download two extra jar files. Download them from the below drive link

https://drive.google.com/open?id=0ByJLBTmJojjzU0VJeHJsOExBQmM

Download the two jar files from the below link and add those two jars also. The final list of dependencies will be as shown in the below screen shot.

After adding all the dependencies, download the winutils files from here and copy them into the $HADOOP_HOME/bin directory.

That’s it all the set up required for running your Hadoop application in Windows. Make sure that your input file is ready.

Here we have created our input file in the project directory itself with the name inp as shown in the below screen shot.

For giving the input and output file paths, Right click on the main class–>Run As–>Run configurations

as shown in the below screen shot.

In the main select the project name and the class name of the program as shown in the below screen shot.

Now move into the Arguments tab and provide the input file path and the output file path as shown in the below screen shot.

Since we have our input file inside the project directory itself, we have just given inp as input file path and then a tabspace. We have given the output file path as just output. It will create the output directory inside the project directory itself.

Now click on Run. You will see the Eclipse console running.

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2016-09-16 00:26:17,574 INFO  [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2016-09-16 00:26:18,228 WARN  [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(153)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2016-09-16 00:26:18,233 WARN  [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(261)) - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2016-09-16 00:26:18,285 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(281)) - Total input paths to process : 1
2016-09-16 00:26:18,382 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(494)) - number of splits:1
2016-09-16 00:26:18,493 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(583)) - Submitting tokens for job: job_local1920454258_0001
2016-09-16 00:26:18,786 INFO  [main] mapreduce.Job (Job.java:submit(1300)) - The url to track the job: http://localhost:8080/
2016-09-16 00:26:18,787 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345)) - Running job: job_local1920454258_0001
2016-09-16 00:26:18,787 INFO  [Thread-2] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(471)) - OutputCommitter set in config null
2016-09-16 00:26:18,801 INFO  [Thread-2] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(489)) - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2016-09-16 00:26:18,839 INFO  [Thread-2] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(448)) - Waiting for map tasks
2016-09-16 00:26:18,840 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:run(224)) - Starting task: attempt_local1920454258_0001_m_000000_0
2016-09-16 00:26:18,892 INFO  [LocalJobRunner Map Task Executor #0] util.ProcfsBasedProcessTree (ProcfsBasedProcessTree.java:isAvailable(181)) - ProcfsBasedProcessTree currently is supported only on Linux.
2016-09-16 00:26:19,208 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:initialize(587)) -  Using ResourceCalculatorProcessTree : [email protected]
2016-09-16 00:26:19,229 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:runNewMapper(753)) - Processing split: file:/C:/Users/Kirankrishna/workspace/Word_count/inp:0+84
2016-09-16 00:26:19,466 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:setEquator(1202)) - (EQUATOR) 0 kvi 26214396(104857584)
2016-09-16 00:26:19,468 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(995)) - mapreduce.task.io.sort.mb: 100
2016-09-16 00:26:19,468 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(996)) - soft limit at 83886080
2016-09-16 00:26:19,468 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(997)) - bufstart = 0; bufvoid = 104857600
2016-09-16 00:26:19,468 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(998)) - kvstart = 26214396; length = 6553600
2016-09-16 00:26:19,472 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:createSortingCollector(402)) - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2016-09-16 00:26:19,486 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) -
2016-09-16 00:26:19,487 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1457)) - Starting flush of map output
2016-09-16 00:26:19,487 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1475)) - Spilling map output
2016-09-16 00:26:19,487 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1476)) - bufstart = 0; bufend = 135; bufvoid = 104857600
2016-09-16 00:26:19,487 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1478)) - kvstart = 26214396(104857584); kvend = 26214348(104857392); length = 49/6553600
2016-09-16 00:26:19,536 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:sortAndSpill(1660)) - Finished spill 0
2016-09-16 00:26:19,544 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:done(1001)) - Task:attempt_local1920454258_0001_m_000000_0 is done. And is in the process of committing
2016-09-16 00:26:19,551 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - map
2016-09-16 00:26:19,551 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:sendDone(1121)) - Task 'attempt_local1920454258_0001_m_000000_0' done.
2016-09-16 00:26:19,551 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:run(249)) - Finishing task: attempt_local1920454258_0001_m_000000_0
2016-09-16 00:26:19,552 INFO  [Thread-2] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - map task executor complete.
2016-09-16 00:26:19,553 INFO  [Thread-2] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(448)) - Waiting for reduce tasks
2016-09-16 00:26:19,554 INFO  [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:run(302)) - Starting task: attempt_local1920454258_0001_r_000000_0
2016-09-16 00:26:19,558 INFO  [pool-3-thread-1] util.ProcfsBasedProcessTree (ProcfsBasedProcessTree.java:isAvailable(181)) - ProcfsBasedProcessTree currently is supported only on Linux.
2016-09-16 00:26:19,593 INFO  [pool-3-thread-1] mapred.Task (Task.java:initialize(587)) -  Using ResourceCalculatorProcessTree : [email protected]
2016-09-16 00:26:19,596 INFO  [pool-3-thread-1] mapred.ReduceTask (ReduceTask.java:run(362)) - Using ShuffleConsumerPlugin: [email protected]
2016-09-16 00:26:19,605 INFO  [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:<init>(196)) - MergerManager: memoryLimit=1321939712, maxSingleShuffleLimit=330484928, mergeThreshold=872480256, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2016-09-16 00:26:19,607 INFO  [EventFetcher for fetching Map Completion Events] reduce.EventFetcher (EventFetcher.java:run(61)) - attempt_local1920454258_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2016-09-16 00:26:19,636 INFO  [localfetcher#1] reduce.LocalFetcher (LocalFetcher.java:copyMapOutput(141)) - localfetcher#1 about to shuffle output of map attempt_local1920454258_0001_m_000000_0 decomp: 120 len: 124 to MEMORY
2016-09-16 00:26:19,661 INFO  [localfetcher#1] reduce.InMemoryMapOutput (InMemoryMapOutput.java:shuffle(100)) - Read 120 bytes from map-output for attempt_local1920454258_0001_m_000000_0
2016-09-16 00:26:19,699 INFO  [localfetcher#1] reduce.MergeManagerImpl (MergeManagerImpl.java:closeInMemoryFile(314)) - closeInMemoryFile -> map-output of size: 120, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->120
2016-09-16 00:26:19,700 INFO  [EventFetcher for fetching Map Completion Events] reduce.EventFetcher (EventFetcher.java:run(76)) - EventFetcher is interrupted.. Returning
2016-09-16 00:26:19,702 INFO  [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2016-09-16 00:26:19,702 INFO  [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(674)) - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2016-09-16 00:26:19,712 INFO  [pool-3-thread-1] mapred.Merger (Merger.java:merge(597)) - Merging 1 sorted segments
2016-09-16 00:26:19,713 INFO  [pool-3-thread-1] mapred.Merger (Merger.java:merge(696)) - Down to the last merge-pass, with 1 segments left of total size: 112 bytes
2016-09-16 00:26:19,714 INFO  [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(751)) - Merged 1 segments, 120 bytes to disk to satisfy reduce memory limit
2016-09-16 00:26:19,716 INFO  [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(781)) - Merging 1 files, 124 bytes from disk
2016-09-16 00:26:19,716 INFO  [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(796)) - Merging 0 segments, 0 bytes from memory into reduce
2016-09-16 00:26:19,717 INFO  [pool-3-thread-1] mapred.Merger (Merger.java:merge(597)) - Merging 1 sorted segments
2016-09-16 00:26:19,719 INFO  [pool-3-thread-1] mapred.Merger (Merger.java:merge(696)) - Down to the last merge-pass, with 1 segments left of total size: 112 bytes
2016-09-16 00:26:19,720 INFO  [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2016-09-16 00:26:19,728 INFO  [pool-3-thread-1] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1049)) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2016-09-16 00:26:19,732 INFO  [pool-3-thread-1] mapred.Task (Task.java:done(1001)) - Task:attempt_local1920454258_0001_r_000000_0 is done. And is in the process of committing
2016-09-16 00:26:19,734 INFO  [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2016-09-16 00:26:19,734 INFO  [pool-3-thread-1] mapred.Task (Task.java:commit(1162)) - Task attempt_local1920454258_0001_r_000000_0 is allowed to commit now
2016-09-16 00:26:19,746 INFO  [pool-3-thread-1] output.FileOutputCommitter (FileOutputCommitter.java:commitTask(439)) - Saved output of task 'attempt_local1920454258_0001_r_000000_0' to file:/C:/Users/Kirankrishna/workspace/Word_count/output/_temporary/0/task_local1920454258_0001_r_000000
2016-09-16 00:26:19,750 INFO  [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - reduce > reduce
2016-09-16 00:26:19,750 INFO  [pool-3-thread-1] mapred.Task (Task.java:sendDone(1121)) - Task 'attempt_local1920454258_0001_r_000000_0' done.
2016-09-16 00:26:19,750 INFO  [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:run(325)) - Finishing task: attempt_local1920454258_0001_r_000000_0
2016-09-16 00:26:19,754 INFO  [Thread-2] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - reduce task executor complete.
2016-09-16 00:26:19,789 WARN  [Thread-2] mapred.LocalJobRunner (LocalJobRunner.java:run(560)) - job_local1920454258_0001
java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethod
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:546)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.HttpMethod
	at java.net.URLClassLoader.findClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	... 1 more
2016-09-16 00:26:19,790 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1366)) - Job job_local1920454258_0001 running in uber mode : false
2016-09-16 00:26:19,804 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1373)) -  map 100% reduce 100%
2016-09-16 00:26:19,805 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1386)) - Job job_local1920454258_0001 failed with state FAILED due to: NA
2016-09-16 00:26:19,819 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1391)) - Counters: 33
	File System Counters
		FILE: Number of bytes read=790
		FILE: Number of bytes written=386816
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Map input records=3
		Map output records=13
		Map output bytes=135
		Map output materialized bytes=124
		Input split bytes=117
		Combine input records=13
		Combine output records=10
		Reduce input groups=10
		Reduce shuffle bytes=124
		Reduce input records=10
		Reduce output records=10
		Spilled Records=20
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=0
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=468713472
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=84
	File Output Format Counters
		Bytes Written=90

You will get the above messages on the console after the completion of the job. You can check the output file in the project directory and you can see the output in the part-r-00000 file as shown in the below screen shot.

In the above screen shot, you can see the output of our wordcount program. We have successfully ran a Hadoop application in Windows.

We hope this blog helped you run a Hadoop application in Windows. Keep visiting our site www.acadgild.com for more updates on bigdata and other technologies.

Hadoop

2 Comments

  1. Hi, I’m ashok. I did as per the directions but i got some warnings on log4j not initialized properly and the program terminated. Can you help me on this.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close