Which best describes how TextInputFormat processes input files and line breaks?
Which best describes how TextInputFormat processes input files and line breaks?
For each input key-value pair, mappers can emit:
For each input key-value pair, mappers can emit:
How many keys will be passed to the Reducer’s reduce method?
You have the following key-value pairs as output from your Map task:
(the, 1)
(fox, 1)
(faster, 1)
(than, 1)
(the, 1)
(dog, 1)
How many keys will be passed to the Reducer’s reduce method?
How will you obtain these user records?
You have user profile records in your OLPT database, that you want to join with web logs you
have already ingested into the Hadoop file system. How will you obtain these user records?
What is the disadvantage of using multiple reducers with the default HashPartitioner and distributing your wor
What is the disadvantage of using multiple reducers with the default HashPartitioner and
distributing your workload across you cluster?
Which InputFormat should you use to complete the line: conf.setInputFormat (____.class) ; ?
Given a directory of files with the following structure: line number, tab character, string:
Example:
1abialkjfjkaoasdfjksdlkjhqweroij
2kadfjhuwqounahagtnbvaswslmnbfgy
3kjfteiomndscxeqalkzhtopedkfsikj
You want to send each line as one record to your Mapper. Which InputFormat should you use to
complete the line: conf.setInputFormat (____.class) ; ?
Which is the best way to make this library available to your MapReducer job at runtime?
You need to perform statistical analysis in your MapReduce job and would like to call methods in
the Apache Commons Math library, which is distributed as a 1.3 megabyte Java archive (JAR) file.
Which is the best way to make this library available to your MapReducer job at runtime?
This is called:
The Hadoop framework provides a mechanism for coping with machine issues such as faulty
configuration or impending hardware failure. MapReduce detects that one or a number of
machines are performing poorly and starts more copies of a map or reduce task. All the tasks run
simultaneously and the task finish first are used. This is called:
For each intermediate key, each reducer task can emit:
For each intermediate key, each reducer task can emit:
What data does a Reducer reduce method process?
What data does a Reducer reduce method process?