What determines the number of Reduces that run a given MapReduce job on a cluster running MapReduce v1 (MRv1)?

seenagapeMay 31, 2015

What determines the number of Reduces that run a given MapReduce job on a cluster running
MapReduce v1 (MRv1)?

PrepAway - Latest Free Exam Questions & Answers

A.
It is set by the Hadoop framework and is based on the number of InputSplits of the job.

B.
It is set by the developer.

C.
It is set by the JobTracker based on the amount of intermediate data.

D.
It is set and fixed by the cluster administrator in mapred-site.xml. The number set always run for
any submitted job.

Explanation:
Number of Reduces
The right number of reduces seems to be 0.95 or 1.75 * (nodes *
mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can launch immediately and start
transfering map outputs as the maps finish. At 1.75 the faster nodes will finish their first round of
reduces and launch a second round of reduces doing a much better job of load balancing.
Currently the number of reduces is limited to roughly 1000 by the buffer size for the output files
(io.buffer.size * 2 * numReduces << heapSize). This will be fixed at some point, but until it is it
provides a pretty firm upper bound.
The number of reduces also controls the number of output files in the output directory, but usually
that is not important because the next map/reduce step will split them into even smaller splits for
the maps.
The number of reduce tasks can also be increased in the same way as the map tasks, via
JobConf’s conf.setNumReduceTasks(int num).
org.apache.hadoop.mapred
Class JobConf

Get 50% Discount on All Your Purchases
at PrepAway.com - Latest Exam Questions

This is ONE TIME OFFER

Enter your email address to receive your 50% off dicount code:

SPECIAL OFFER: GET 50% OFF

Use Discount Code:

Briefing Cloudera Knowledge

Free Cloudera Study Guide

What determines the number of Reduces that run a given MapReduce job on a cluster running MapReduce v1 (MRv1)?

Leave a Reply Cancel reply