On a cluster running MapReduce v2 (MRv2) on YARN, a MapReduce job is given a directory of 10
plain text files as its input directory. Each file is made up of 3 HDFS blocks. How many Mappers
will run?

A.
We cannot say; the number of Mappers is determined by the ResourceManager
B.
We cannot say; the number of Mappers is determined by the developer
C.
30
D.
3
E.
10
F.
We cannot say; the number of mappers is determined by the ApplicationMaster
C is correct answer – 1 mapper per input split/block. 30 blocks.
0
0
C is the correct answer.
0
0
Answer: C
Each block will have a Mapper running, 3 blocks for a file, 3*10 = 30 Mappers
0
0
The number of mappers is based on the the number of input split (Which is decided on InputFormat) so it depends on the developers I think B is the right one
0
0
Number of mappers depends on input split size, which is equal to block size in default so here it will be 30 mappers, but developer has option to overwrite this parameter and can decide the input split size which can further modify the number of mappers for the job.
0
0
Default split size is 1 block, 30 blocks hence 30 mappers, unless developer does partition otherwise. “C”.
0
0
I think the context here is – 1 file takes 3 Block means with replications factor 3 (default). Hence 1 job per file so its 10
0
0
should be E , people choose C is because they forgot the file is actually only contain 1 block , the rest 2 are replica
0
0
No, it didn’t indicate the replica is 3, and what about if a file does need 3 blocks? the answer should be C
0
0
What is the correct answer ?? C or E
0
0
its C, each file is 3 blocks, mean 10×3=30, without replication each file size is 3 blocks
0
0
Mappers are instantiated based on the number of input splits, not number of blocks.
Answer is E
0
0
no of input splits equal to no of mappers.
answer:E
0
0
E. Based there’s no information about the split size, it could be larger than the block size.
Number of Mappers depends on the number of splits, however if the files are less then the split size then each file will correspond to one mapper. that is the reason large number of small files are not recommended
determining properties to decide split size and there default values are as follows
mapred.min.split.size=1 (in bytes)
mapred.max.split.size=Long.MAX_VALUE
dfs.block.size=64 MB
split size is calculated as
inputSplitSize=max(minimumSize, min(maximumSize, blockSize))
# of mappers= totalInputSize/inputSplitSize
0
0
I have the same idea. E
0
0
its E , have verified it
0
0
C.
If you have not defined any input split size in Map/Reduce program then default HDFS block split will be considered as input split.There is no mention about replica.
0
0