PrepAway - Latest Free Exam Questions & Answers

What determines the number of Reduces that run a given MapReduce job on a cluster running MapReduce v1 (MRv1)?

What determines the number of Reduces that run a given MapReduce job on a cluster running
MapReduce v1 (MRv1)?

PrepAway - Latest Free Exam Questions & Answers

A.
It is set by the Hadoop framework and is based on the number of InputSplits of the job.

B.
It is set by the developer.

C.
It is set by the JobTracker based on the amount of intermediate data.

D.
It is set and fixed by the cluster administrator in mapred-site.xml. The number set always run for
any submitted job.

Explanation:
Number of Reduces
The right number of reduces seems to be 0.95 or 1.75 * (nodes *
mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can launch immediately and start
transfering map outputs as the maps finish. At 1.75 the faster nodes will finish their first round of
reduces and launch a second round of reduces doing a much better job of load balancing.
Currently the number of reduces is limited to roughly 1000 by the buffer size for the output files
(io.buffer.size * 2 * numReduces << heapSize). This will be fixed at some point, but until it is it
provides a pretty firm upper bound.
The number of reduces also controls the number of output files in the output directory, but usually
that is not important because the next map/reduce step will split them into even smaller splits for
the maps.
The number of reduce tasks can also be increased in the same way as the map tasks, via
JobConf’s conf.setNumReduceTasks(int num).
org.apache.hadoop.mapred
Class JobConf


Leave a Reply

Your email address will not be published. Required fields are marked *