PrepAway - Latest Free Exam Questions & Answers

where does the Mapper place the intermediate data each Map task?

During the execution of a MapReduce v2 (MRv2) job on YARN, where does the Mapper place the
intermediate data each Map task?

PrepAway - Latest Free Exam Questions & Answers

A.
The Mapper stores the intermediate data on the mode running the job’s ApplicationMaster so
that is available to YARN’s ShuffleService before the data is presented to the Reducer

B.
The Mapper stores the intermediate data in HDFS on the node where the MAP tasks ran in the
HDFS /usercache/&[user]sppcache/application_&(appid) directory for the user who ran the job

C.
YARN holds the intermediate data in the NodeManager’s memory (a container) until it is
transferred to the Reducers

D.
The Mapper stores the intermediate data on the underlying filesystem of the local disk in the
directories yarn.nodemanager.local-dirs

E.
The Mapper transfers the intermediate data immediately to the Reducers as it generated by the
Map task

7 Comments on “where does the Mapper place the intermediate data each Map task?

  1. Vic says:

    Ref for Hadoop Definitive Guide:
    The client then calls read() on the stream (step 3). DFSInputStream, which has stored
    the datanode addresses for the first few blocks in the file, then connects to the first
    (closest) datanode for the first block in the file. Data is streamed from the datanode
    back to the client, which calls read() repeatedly on the stream (step 4).




    0



    0
  2. fjbanezares says:

    What does this quotation from HDFS quote has to do with the question?

    Answer I think is D) thinking in MRv1 terms, where data is stored in the local filesystem where the mapper process reside. Nevertheless this is YARN where the Resource Manager takes care of the allocation of resources.

    Anyway the MapReduce ApplicationMaster in the current version only talks to the Resource Manager in order to get RAM and CPU. So by now the mapper acts as in the old times keeping the data in the local filesystem. Maybe in future versions of YARN the RM also takes care of HD usage but not for now.

    So D is correct.




    0



    0
  3. Dev says:

    D is the answer. local-dirs is the place where intermediate data is stored. and several places Tom White/Eric Sammer have mentioned to keep these directories large enough to hold map output.




    0



    0

Leave a Reply

Your email address will not be published. Required fields are marked *