PrepAway - Latest Free Exam Questions & Answers

Which of the following best describes the workings of TextInputFormat?

Which of the following best describes the workings of TextInputFormat?

PrepAway - Latest Free Exam Questions & Answers

A.
Input file splits may cross line breaks. A line that crosses tile splits is ignored.

B.
The input file is split exactly at the line breaks, so each Record Reader will read a series of
complete lines.

C.
Input file splits may cross line breaks. A line that crosses file splits is read by the
RecordReaders of both splits containing the broken line.

D.
Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader
of the split that contains the end of the broken line.

E.
Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader
of the split that contains the beginning of the broken line.

Explanation:
As the Map operation is parallelized the input file set is first split to several pieces
called FileSplits. If an individual file is so large that it will affect seek time it will be split to several
Splits. The splitting does not know anything about the input file’s internal logical structure, for
example line-oriented text files are split on arbitrary byte boundaries. Then a new map task is
created per FileSplit.
When an individual map task starts it will open a new output writer per configured reduce task. It
will then proceed to read its FileSplit using the RecordReader it gets from the specified
InputFormat. InputFormat parses the input and generates key-value pairs. InputFormat must also
handle records that may be split on the FileSplit boundary. For example TextInputFormat will read
the last line of the FileSplit past the split boundary and, when reading other than the first FileSplit,
TextInputFormat ignores the content up to the first newline.
Reference:How Map and Reduce operations are actually carried out
http://wiki.apache.org/hadoop/HadoopMapReduce(Map, second paragraph)

4 Comments on “Which of the following best describes the workings of TextInputFormat?


Leave a Reply to ramakrishna Cancel reply

Your email address will not be published. Required fields are marked *