Monday, February 11, 2013

Map Reduce Simplified

Yes it is about parallel and distributed computing, there are tonnes of web pages, books articles, diagrams etc. etc with nice buzz words to talk about Map Reduce, here is the most simplified explanation.

Lets take a real life example.

1. Company CEO called all Program Manager's "I need total effort spent this month by noon". Program Manager's no problem sir. Why are Program Manager's not worried because they are going to distribute task :-)

2. Each Program Manager called their project manager asking for effort spent so far.

3. Each Project Manager pulled up effort sheet and provided it to their Program Managers.

4. Program Managers complied received sheet into one file and sent it to CEO.

5. Company CEO collated all the sheets and calculated total effort spent.

Each individually broke its task to smaller tasks (Mapped its input task to smaller tasks), Program Manager was required to provide effort spent, he mapped his task to smaller tasks, this is MAP.

Program Manager's on receiving data from their project managers compiled it back to single output, this is REDUCE.

Now lets zoom out and summarize how Map Reduce applies to distributed and parallel computing. Each node distributes its task to smaller tasks(Maps its given task). Each node receive results, combine them(REDUCE) to generate required output.


No comments:

Post a Comment

Java 8 streams performance on mathematical calculations

Java 8 Streams API supports many parallel operations to process the data, it abstracts low level multithreading logic. To test performance d...