Briefing Cloudera Knowledge

which best describes when the reduce method is first called in a MapReduce job?

Determine which best describes when the reduce method is first called in a MapReduce job?

A.
Reducers start copying intermediate key-value pairs from each Mapper as soon as it has
completed. The programmer can configure in the job what percentage of the intermediate data
should arrive before the reduce method begins.

B.
Reducers start copying intermediate key-value pairs from each Mapper as soon as it has
completed. The reduce method is called only after all intermediate data has been copied and
sorted.

C.
Reduce methods and map methods all start at the beginning of a job, in order to provide
optimal performance for map-only or reduce-only jobs.

D.
Reducers start copying intermediate key-value pairs from each Mapper as soon as it has
completed. The reduce method is called as soon as the intermediate key-value pairs start to
arrive.

Explanation:
* In a MapReduce job reducers do not start executing the reduce method until the all
Map jobs have completed. Reducers start copying intermediate key-value pairs from the mappers
as soon as they are available. The programmer defined reduce method is called only after all the
mappers have finished.

* Reducers start copying intermediate key-value pairs from the mappers as soon as they are
available. The progress calculation also takes in account the processing of data transfer which is
done by reduce process, therefore the reduce progress starts showing up as soon as any
intermediate key-value pair for a mapper is available to be transferred to reducer. Though the
reducer progress is updated still the programmer defined reduce method is called only after all the
mappers have finished.
Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers , When is the
reducers are started in a MapReduce job?