For each input key-value pair, mappers can emit:
For each input key-value pair, mappers can emit:
For each intermediate key, each reducer task can emit:
For each intermediate key, each reducer task can emit:
Determine the difference between setting the number of reducers to zero.
You write a MapReduce job to process 100 files in HDFS. Your MapReduce algorithm uses
TextInputFormat and the IdentityReducer: the mapper applies a regular expression over input
values and emits key-value pairs with the key consisting of the matching text, and the value
containing the filename and byte offset. Determine the difference between setting the number of
reducers to zero.
What happens in a MapReduce job when you set the number of reducers to zero?
What happens in a MapReduce job when you set the number of reducers to zero?
In writing a MapReduce program to accomplish this, can you take advantage of a combiner?
You have a large dataset of key-value pairs, where the keys are strings, and the values are
integers. For each unique key, you want to identify the largest integer. In writing a MapReduce
program to accomplish this, can you take advantage of a combiner?
how many key-value pairs will there be in each file?
If you run the word count MapReduce program with m mappers and r reducers, how many output
files will you get at the end of the job? And how many key-value pairs will there be in each file?
Assume k is the number of unique words in the input files.
which of the following interfaces is most likely to reduce the amount of intermediate data transferred across
You’ve written a MapReduce job that will process 500 million input records and generate 500
million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a
significant amount of intermediate data that it needs to transfer between mappers and reducers
which is a potential bottleneck. A custom implementation of which of the following interfaces is
most likely to reduce the amount of intermediate data transferred across the network?
Which statement best describes the data path of intermediate key-value pairs (i.e., output of the mappers)?
Which statement best describes the data path of intermediate key-value pairs (i.e., output of the
mappers)?
Would HDFS be appropriate for this customer information file?
You need to create a GUI application to help your company’s sales people add and edit customer
information. Would HDFS be appropriate for this customer information file?
When is the reduce method first called in a MapReduce job?
When is the reduce method first called in a MapReduce job?