PrepAway - Latest Free Exam Questions & Answers

which best describes when the reduce method is first called in a MapReduce job?

Determine which best describes when the reduce method is first called in a MapReduce job?

PrepAway - Latest Free Exam Questions & Answers

A.
Reducers start copying intermediate key-value pairs from each Mapper as soon as it has
completed. The programmer can configure in the job what percentage of the intermediate data
should arrive before the reduce method begins.

B.
Reducers start copying intermediate key-value pairs from each Mapper as soon as it has
completed. The reduce method is called only after all intermediate data has been copied and
sorted.

C.
Reduce methods and map methods all start at the beginning of a job, in order to provide
optimal performance for map-only or reduce-only jobs.

D.
Reducers start copying intermediate key-value pairs from each Mapper as soon as it has
completed. The reduce method is called as soon as the intermediate key-value pairs start to
arrive.

Explanation:
* In a MapReduce job reducers do not start executing the reduce method until the all
Map jobs have completed. Reducers start copying intermediate key-value pairs from the mappers
as soon as they are available. The programmer defined reduce method is called only after all the
mappers have finished.

* Reducers start copying intermediate key-value pairs from the mappers as soon as they are
available. The progress calculation also takes in account the processing of data transfer which is
done by reduce process, therefore the reduce progress starts showing up as soon as any
intermediate key-value pair for a mapper is available to be transferred to reducer. Though the
reducer progress is updated still the programmer defined reduce method is called only after all the
mappers have finished.
Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers , When is the
reducers are started in a MapReduce job?

10 Comments on “which best describes when the reduce method is first called in a MapReduce job?

  1. Dana says:

    should be A. the default value mapred.reduce.slowstart.completed.maps in mapred-site.xml can be configured from 0.0 to 1.0 to decide what percentage of the intermediate data should arrive before the reduce method begins.




    0



    0
    1. Nishanth says:

      The parameter you mentioned controls the shuffle phase as in when should the reducers should start copying intermediate files.

      The question explicitly mentions ‘reduce method'( not reducer per se) which would only start once all the intermediate output has been copied as part of the shuffle , sorted ( and merged ) as part of the sort phase.

      Answer is B.




      0



      0
  2. Sandeep says:

    mapred.reduce.slowstart.completed.maps property states the : Fraction of the number of maps in the job which should be complete before reduces are scheduled for the job. It means reducer task will start copying map outputs to its local. The reduce method will get called once all mappers are completed. So Answer is B




    0



    0

Leave a Reply

Your email address will not be published. Required fields are marked *