Which best describes how TextInputFormat processes input files and line breaks?

seenagapeMay 27, 2015

PrepAway - Latest Free Exam Questions & Answers

A.
Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader
of the split that contains the beginning of the broken line.

B.
Input file splits may cross line breaks. A line that crosses file splits is read by the
RecordReaders of both splits containing the broken line.

C.
The input file is split exactly at the line breaks, so each RecordReader will read a series of
complete lines.

D.
Input file splits may cross line breaks. A line that crosses file splits is ignored.

E.
Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader
of the split that contains the end of the broken line.

Explanation:
As the Map operation is parallelized the input file set is first split to several pieces
called FileSplits. If an individual file is so large that it will affect seek time it will be split to several
Splits. The splitting does not know anything about the input file’s internal logical structure, for
example line-oriented text files are split on arbitrary byte boundaries. Then a new map task is
created per FileSplit.
When an individual map task starts it will open a new output writer per configured reduce task. It
will then proceed to read its FileSplit using the RecordReader it gets from the specified
InputFormat. InputFormat parses the input and generates key-value pairs. InputFormat must also
handle records that may be split on the FileSplit boundary. For example TextInputFormat will read
the last line of the FileSplit past the split boundary and, when reading other than the first FileSplit,
TextInputFormat ignores the content up to the first newline.
Reference: How Map and Reduce operations are actually carried out

11 Comments on “Which best describes how TextInputFormat processes input files and line breaks?”

Srinivas says:

April 2, 2014 at 6:35 am

This should e option A right?

Wank says:

April 30, 2014 at 12:53 am

Shouldn’t it be A

Abhishek says:

September 5, 2014 at 2:30 pm

It should be A

June says:

November 24, 2014 at 11:22 am

should be A

ravindrapaliwal says:

December 16, 2014 at 9:16 am

A should be Correct answer,

yogeswaran says:

December 21, 2014 at 8:22 am

It is option A.

If the split starting position is not zero, then the input split will seek one position before the start and skip the first record(/n). If the previous end of split is the end of line(/n), then the current split will start processing from the beginning of that split. If the previous end of split is not end of line(/n), then it’ll go till the /n and the current split will start reading after that /n.

if (start != 0) {
skipFirstLine = true;
–start;
fileIn.seek(start);
}
in = new LineReader(fileIn, job);
}
if (skipFirstLine) { // skip first line and re-establish “start”.
start += in.readLine(new Text(), 0,
(int)Math.min((long)Integer.MAX_VALUE, end – start));
}
this.pos = start;

Please correct me if I’m wrong. Else please update the answer. Thanks!

Vishal says:

February 7, 2015 at 10:58 pm

The answer is C. TestInputformat consider each line as a record/value. for more information read Hadoop:The definitive guide-page246.

chetan says:

April 23, 2015 at 6:46 am

I believe Answer is E

Debi says:

June 19, 2015 at 2:42 am

C is the answer
Hadoop definitive Guide – 4th edition – page 276

Ravindra Kumar says:

July 5, 2015 at 7:45 pm

A is the correct answer

anonymous says:

June 19, 2017 at 3:53 am

the correct one is ‘E’. “Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line”. Validated in exam.

Srinivas says:

April 2, 2014 at 6:35 am

This should e option A right?

0

0

Wank says:

April 30, 2014 at 12:53 am

Shouldn’t it be A

0

0

Abhishek says:

September 5, 2014 at 2:30 pm

It should be A

0

0

June says:

November 24, 2014 at 11:22 am

should be A

0

0

ravindrapaliwal says:

December 16, 2014 at 9:16 am

A should be Correct answer,

0

0

yogeswaran says:

December 21, 2014 at 8:22 am

It is option A.

If the split starting position is not zero, then the input split will seek one position before the start and skip the first record(/n). If the previous end of split is the end of line(/n), then the current split will start processing from the beginning of that split. If the previous end of split is not end of line(/n), then it’ll go till the /n and the current split will start reading after that /n.

if (start != 0) {
skipFirstLine = true;
–start;
fileIn.seek(start);
}
in = new LineReader(fileIn, job);
}
if (skipFirstLine) { // skip first line and re-establish “start”.
start += in.readLine(new Text(), 0,
(int)Math.min((long)Integer.MAX_VALUE, end – start));
}
this.pos = start;

Please correct me if I’m wrong. Else please update the answer. Thanks!

0

0

Vishal says:

February 7, 2015 at 10:58 pm

The answer is C. TestInputformat consider each line as a record/value. for more information read Hadoop:The definitive guide-page246.

0

0

chetan says:

April 23, 2015 at 6:46 am

I believe Answer is E

0

0

Debi says:

June 19, 2015 at 2:42 am

C is the answer
Hadoop definitive Guide – 4th edition – page 276

0

0

Ravindra Kumar says:

July 5, 2015 at 7:45 pm

A is the correct answer

0

0

anonymous says:

June 19, 2017 at 3:53 am

the correct one is ‘E’. “Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line”. Validated in exam.

0

0

Get 50% Discount on All Your Purchases
at PrepAway.com - Latest Exam Questions

This is ONE TIME OFFER

Enter your email address to receive your 50% off dicount code:

SPECIAL OFFER: GET 50% OFF

Use Discount Code:

Briefing Cloudera Knowledge

Free Cloudera Study Guide

Which best describes how TextInputFormat processes input files and line breaks?

11 Comments on “Which best describes how TextInputFormat processes input files and line breaks?”

Leave a Reply Cancel reply