PrepAway - Latest Free Exam Questions & Answers

Which statement best describes the data path of intermediate key-value pairs (i.e., output of the mappers)?

Which statement best describes the data path of intermediate key-value pairs (i.e., output of the

mappers)?

PrepAway - Latest Free Exam Questions & Answers

A.
Intermediate key-value pairs are written to HDFS. Reducers read the intermediate data from
HDFS.

B.
Intermediate key-value pairs are written to HDFS. Reducers copy the intermediate data to the
local disks of the machines running the reduce tasks.

C.
Intermediate key-value pairs are written to the local disks of the machines running the map
tasks, and then copied to the machine running the reduce tasks.

D.
Intermediate key-value pairs are written to the local disks of the machines running the map
tasks, and are then copied to HDFS. Reducers read the intermediate data from HDFS.

Explanation:
The mapper output (intermediate data) is stored on the Local file system (NOT
HDFS) of each individual mapper nodes. This is typically a temporary directory location which can
be setup in config by the hadoop administrator. The intermediate data is cleaned up after the
Hadoop Job completes.
Note:
*Reducers start copying intermediate key-value pairs from the mappers as soon as they are
available. The progress calculation also takes in account the processing of data transfer which is
done by reduce process, therefore the reduce progress starts showing up as soon as any
intermediate key-value pair for a mapper is available to be transferred to reducer. Though the
reducer progress is updated still the programmer defined reduce method is called only after all the
mappers have finished.
*Reducer is input the grouped output of a Mapper. In the phase the framework, for each Reducer,
fetches the relevant partition of the output of all the Mappers, via HTTP.
*Mapper maps input key/value pairs to a set of intermediate key/value pairs.
Maps are the individual tasks that transform input records into intermediate records. The
transformed intermediate records do not need to be of the same type as the input records. A given
input pair may map to zero or many output pairs.
*All intermediate values associated with a given output key are subsequently grouped by the
framework, and passed to the Reducer(s) to determine the final output.
Reference:Questions & Answers for Hadoop MapReduce developers,Where is the Mapper Output
(intermediate kay-value data) stored ?

3 Comments on “Which statement best describes the data path of intermediate key-value pairs (i.e., output of the mappers)?

  1. Ramesh Hiremath says:

    C.
    Intermediate key-value pairs are written to the local disks of the machines running the map
    tasks, and then copied to the machine running the reduce tasks.




    0



    0

Leave a Reply

Your email address will not be published. Required fields are marked *