The Hadoop framework provides a mechanism for coping with machine issues such as faulty
configuration or impending hardware failure. MapReduce detects that one or a number of
machines are performing poorly and starts more copies of a map or reduce task. All the tasks run
simultaneously and the task finish first are used. This is called:
What is the disadvantage of using multiple reducers with the default HashPartitioner and
distributing your workload across you cluster?
You have user profile records in your OLPT database, that you want to join with web logs you
have already ingested into the Hadoop file system. How will you obtain these user records?
You have the following key-value pairs as output from your Map task:
How many keys will be passed to the Reducer’s reduce method?
For each input key-value pair, mappers can emit:
You need to perform statistical analysis in your MapReduce job and would like to call methods in
the Apache Commons Math library, which is distributed as a 1.3 megabyte Java archive (JAR) file.
Which is the best way to make this library available to your MapReducer job at runtime?
Given a directory of files with the following structure: line number, tab character, string:
You want to send each line as one record to your Mapper. Which InputFormat should you use to
complete the line: conf.setInputFormat (____.class) ; ?
All keys used for intermediate output from mappers must:
What data does a Reducer reduce method process?
For each intermediate key, each reducer task can emit: