Briefing Cloudera Knowledge

which of the following interfaces is most likely to reduce the amount of intermediate data transferr

You’ve written a MapReduce job that will process 500 million input records and generate 500
million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a
significant amount of intermediate data that it needs to transfer between mappers and reducers
which is a potential bottleneck. A custom implementation of which of the following interfaces is
most likely to reduce the amount of intermediate data transferred across the network?

A.
Writable

B.
WritableComparable

C.
InputFormat

D.
OutputFormat

E.
Combiner

F.
Partitioner

Explanation:
Users can optionally specify a combiner, via JobConf.setCombinerClass(Class), to
perform local aggregation of the intermediate outputs, which helps to cut down the amount of data
transferred from the Mapper to the Reducer.
Reference:Map/Reduce Tutorial
http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html(Mapper, 9th paragraph)