What is the behavior of the default partitioner?
What is the behavior of the default partitioner?
Which statement best describes the data path of intermediate key-value pairs (i.e., output of the mappers)?
Which statement best describes the data path of intermediate key-value pairs (i.e., output of the
mappers)?
how many key-value pairs will there be in each file?
If you run the word count MapReduce program with m mappers and r reducers, how many output
files will you get at the end of the job? And how many key-value pairs will there be in each file?
Assume k is the number of unique words in the input files.
In writing a MapReduce program to accomplish this, can you take advantage of a combiner?
You have a large dataset of key-value pairs, where the keys are strings, and the values are
integers. For each unique key, you want to identify the largest integer. In writing a MapReduce
program to accomplish this, can you take advantage of a combiner?
What happens in a MapReduce job when you set the number of reducers to zero?
What happens in a MapReduce job when you set the number of reducers to zero?
Combiners Increase the efficiency of a MapReduce program because:
Combiners Increase the efficiency of a MapReduce program because:
how many distinct copy operations will there be in the sort/shuffle phase?
In a large MapReduce job with m mappers and r reducers, how many distinct copy operations will
there be in the sort/shuffle phase?
What happens in a MapReduce job when you set the number of reducers to one?
What happens in a MapReduce job when you set the number of reducers to one?
why might using a combiner reduce the overall Job running time?
In the standard word count MapReduce algorithm, why might using a combiner reduce the overall
Job running time?
What is the storage capacity of your Hadoop cluster (assuming no compression)?
Your cluster has 10 DataNodes, each with a single 1 TB hard drive. You utilize all your disk
capacity for HDFS, reserving none for MapReduce. You implement default replication settings.
What is the storage capacity of your Hadoop cluster (assuming no compression)?