In this video blog post we will be discussing about how Map Reduce works in Hadoop with a Real time example.
Map Reduce in Simple Words:
Input splits–>Map phase–>Reduce phase
Generally, Map Reduce consist of two phases; one is Map phase and the other is Reduce phase. In the Map phase, the input is split into sub jobs and these sub jobs are mapped to different CPU’s/processors. Mappers work on processors, meaning, for every input split there will be a mapper working on it. After the Map phase, there is something called Sorting and Shuffling, where all the keys and values of each mapper will be shuffled and arranged in a sorted order based on the keys. Lastly, all the output of the processors (Mappers) are collected and sent to the Reduce phase.
In the Mapper phase, key and values will be prepared, and in the Reduce phase, all the values for each unique key will be reduced to get the final result.
Comparison of Real Life Example with Map Reduce:
In India, after the elections, all the EVM’s are brought to one place for counting and then Polling officers perform the count of the votes stored in EVM.
This means the actual work is done by polling officer but that work is performed on EVM machine.
Let’s now link the components of the election with the actual components of MapReduce.
Input Splits: Here, Input split is the EVM’s that corresponds to one polling booth and votes stored in one EVM is calculated by one polling officer.
Map Phase: In the Map phase, each Polling officer gets the ballot count of each candidate, in his respective polling booth. This is done simultaneously for each polling booth. From here, each candidate will become the key and the number of votes for the candidate will be the values.
Reduce Phase: In the Reduce phase, the ballot count for each booth under a parliament seat position is taken and results are generated for each candidate.
Which means that all the individual results of each polling booth will be collected and counted based on the keys. Finally, the total number of votes generated for that candidate will be calculated.
The process is represented pictorially as shown in the below image.
Here, for each booth, there will be several EVMs and people will cast their votes in it. In this example, there are three candidates R, G, B.
In the Map phase, key and values are prepared for each booth. Key will be the candidate and the Values are the number of votes for that candidate in that particular booth.
After the Map phase, sorting and shuffling is done, where the candidate (key) in each Mapper are shuffled and accumulated at one place.
In the Reduce phase, all the keys are sorted and the values for each key will be counted and finally the total votes for each candidate will be calculated.
We hope this post helped you in understanding how Map Reduce works and how it is implemented. Keep visiting our site www.acadgild.com for more updates on Big Data and other technologies