PrepAway - Latest Free Exam Questions & Answers

how many blocks the input file occupies?

In a MapReduce job, you want each of your input files processed by a single map task. How do
you configure a MapReduce job so that a single map task processes each input file regardless of
how many blocks the input file occupies?

PrepAway - Latest Free Exam Questions & Answers

A.
Increase the parameter that controls minimum split size in the job configuration.

B.
Write a custom MapRunner that iterates over all key-value pairs in the entire file.

C.
Set the number of mappers equal to the number of input files you want to process.

D.
Write a custom FileInputFormat and override the method isSplitable to always return false.

Explanation:
FileInputFormat is the base class for all file-based InputFormats. This provides a
generic implementation of getSplits(JobContext). Subclasses of FileInputFormat can also override
the isSplitable(JobContext, Path) method to ensure input-files are not split-up and are processed
as a whole by Mappers.
Reference: org.apache.hadoop.mapreduce.lib.input, Class FileInputFormat<K,V>

2 Comments on “how many blocks the input file occupies?


Leave a Reply

Your email address will not be published. Required fields are marked *