PrepAway - Latest Free Exam Questions & Answers

Determine the difference between setting the number of reducers to zero.

You write a MapReduce job to process 100 files in HDFS. Your MapReduce algorithm uses
TextInputFormat and the IdentityReducer: the mapper applies a regular expression over input
values and emits key-value pairs with the key consisting of the matching text, and the value
containing the filename and byte offset. Determine the difference between setting the number of
reducers to zero.

PrepAway - Latest Free Exam Questions & Answers

A.
There is no difference in output between the two settings.

B.
With zero reducers, no reducer runs and the job throws an exception. With one reducer,
instances of matching patterns are stored in a single file on HDFS.

C.
With zero reducers, all instances of matching patterns are gathered together in one file on
HDFS. With one reducer, instances of matching patterns stored in multiple files on HDFS.

D.
With zero reducers, instances of matching patterns are stored in multiple files on HDFS. With
one reducer, all instances of matching patterns are gathered together in one file on HDFS.

Explanation:
*It is legal to set the number of reduce-tasks to zero if no reduction is desired.
In this case the outputs of the map-tasks go directly to the FileSystem, into the output path set by
setOutputPath(Path). The framework does not sort the map-outputs before writing them out to the
FileSystem.
*Often, you may want to process input data using a map function only. To do this, simply set
mapreduce.job.reduces to zero. The MapReduce framework will not create any reducer tasks.
Rather, the outputs of the mapper tasks will be the final output of the job.
Note:
Reduce
In this phase the reduce(WritableComparable, Iterator, OutputCollector, Reporter) method is
called for each <key, (list of values)> pair in the grouped inputs.
The output of the reduce task is typically written to the FileSystem via
OutputCollector.collect(WritableComparable, Writable).
Applications can use the Reporter to report progress, set application-level status messages and
update Counters, or just indicate that they are alive.
The output of the Reducer is not sorted.

3 Comments on “Determine the difference between setting the number of reducers to zero.

  1. Ramesh Hiremath says:

    D.
    With zero reducers, instances of matching patterns are stored in multiple files on HDFS. With
    one reducer, all instances of matching patterns are gathered together in one file on HDFS.




    0



    0

Leave a Reply

Your email address will not be published. Required fields are marked *