which the reduce method of a given Reducer can be called?
When is the earliest point at which the reduce method of a given Reducer can be called?
Which describes how a client reads a file from HDFS?
Which describes how a client reads a file from HDFS?
Which interface should your class implement?
You are developing a combiner that takes as input Text keys, IntWritable values, and emits Text
keys, IntWritable values. Which interface should your class implement?
Indentify the utility that allows you to create and run MapReduce jobs with any executable or script as the ma
Indentify the utility that allows you to create and run MapReduce jobs with any executable or script
as the mapper and/or the reducer?
How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of MapRe
How are keys and values presented and passed to the reducers during a standard sort and shuffle
phase of MapReduce?
Assuming default settings, which best describes the order of data provided to a reducer’s reduce method:
Assuming default settings, which best describes the order of data provided to a reducer’s reduce
method:
Indentify the number of failed task attempts you can expect when you run the job with mapred.max.map.attempts
You wrote a map function that throws a runtime exception when it encounters a control character
in input data. The input supplied to your mapper contains twelve such characters totals, spread
across five file splits. The first four file splits each have two control characters and the last split has
four control characters.
Indentify the number of failed task attempts you can expect when you run the job with
mapred.max.map.attempts set to 4:
which method in the Mapper you should use to implement code for reading the file and populating the associativ
You want to populate an associative array in order to perform a map-side join. You’ve decided to
put this information in a text file, place that file into the DistributedCache and read it in your
Mapper before any records are processed.
Indentify which method in the Mapper you should use to implement code for reading the file and
populating the associative array?
which interface is most likely to reduce the amount of intermediate data transferred across the network?
You’ve written a MapReduce job that will process 500 million input records and generated 500
million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a
significant amount of intermediate data that it needs to transfer between mappers and reduces
which is a potential bottleneck. A custom implementation of which interface is most likely to reduce
the amount of intermediate data transferred across the network?
Can you use MapReduce to perform a relational join on two large tables sharing a key?
Can you use MapReduce to perform a relational join on two large tables sharing a key? Assume
that the two tables are formatted as comma-separated files in HDFS.