Briefing Cloudera Knowledge

Where is intermediate data written to after being emitted from the Mapper’s map method?

You have just executed a MapReduce job. Where is intermediate data written to after being
emitted from the Mapper’s map method?

A.
Intermediate data in streamed across the network from Mapper to the Reduce and is never
written to disk.

B.
Into in-memory buffers on the TaskTracker node running the Mapper that spill over and are
written into HDFS.

C.
Into in-memory buffers that spill over to the local file system of the TaskTracker node running
the Mapper.

D.
Into in-memory buffers that spill over to the local file system (outside HDFS) of the TaskTracker
node running the Reducer

E.
Into in-memory buffers on the TaskTracker node running the Reducer that spill over and are
written into HDFS.

Explanation:
The mapper output (intermediate data) is stored on the Local file system (NOT
HDFS) of each individual mapper nodes. This is typically a temporary directory location which can

be setup in config by the hadoop administrator. The intermediate data is cleaned up after the
Hadoop Job completes.
Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, Where is the
Mapper Output (intermediate kay-value data) stored ?