PrepAway - Latest Free Exam Questions & Answers

where does the Mapper place the intermediate data of each Map Task?

During the execution of a MapReduce v2 (MRv2) job on YARN, where does the Mapper place the
intermediate data of each Map Task?

PrepAway - Latest Free Exam Questions & Answers

A.
The Mapper stores the intermediate data on the node running the Job’s ApplicationMaster so
that it is available to YARN ShuffleService before the data is presented to the Reducer

B.
The Mapper stores the intermediate data in HDFS on the node where the Map tasks ran in the
HDFS /usercache/&(user)/apache/application_&(appid) directory for the user who ran the job

C.
The Mapper transfers the intermediate data immediately to the reducers as it is generated by
the Map Task

D.
YARN holds the intermediate data in the NodeManager’s memory (a container) until it is
transferred to the Reducer

E.
The Mapper stores the intermediate data on the underlying filesystem of the local disk in the
directories yarn.nodemanager.locak-DIFS

11 Comments on “where does the Mapper place the intermediate data of each Map Task?

    1. TL says:

      List of directories to store localized files in. An application’s localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers’ work directories, called container_${contid}, will be subdirectories of this.




      0



      0
  1. Yuriy says:

    E is correct but
    should be directories yarn.nodemanager.local-dirs parameter
    yarn.nodemanager.local-dirs: This is a comma separated list of local-directories that one can configure to be used for copying files during localization. The idea behind allowing multiple directories is to use multiple disks for localization – it helps both fail-over (one/few disk(s) going bad doesn’t affect all containers) and load balancing (no single disk is bottlenecked with writes). Thus, individual directories should be configured if possible on different local disks.
    https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml




    0



    0
  2. jx says:

    Hadoop the definitive guide 4th ch7 Shuffle and sort
    During a MapReduce job, intermediate data and working files are written to temporary
    local files. Because this data includes the potentially very large output of map tasks, you need to ensure that the yarn.nodemanager.local-dirs property, which controls the location of local temporary storage for YARN containers, is configured to use disk partitions that are large enough.




    0



    0

Leave a Reply

Your email address will not be published. Required fields are marked *