PrepAway - Latest Free Exam Questions & Answers

how many blocks the input file occupies?

In a MapReduce job, you want each of you input files processed by a single map task. How do you
configure a MapReduce job so that a single map task processes each input file regardless of how
many blocks the input file occupies?

PrepAway - Latest Free Exam Questions & Answers

Increase the parameter that controls minimum split size in the job configuration.

Write a custom MapRunner that iterates over all key-value pairs in the entire file.

Set the number of mappers equal to the number of input files you want to process.

Write a custom FileInputFormat and override the method isSplittable to always return false.

*// Do not allow splitting.
protected boolean isSplittable(JobContext context, Path filename) {
return false;
*InputSplits: An InputSplit describes a unit of work that comprises a single map task in a
MapReduce program. A MapReduce program applied to a data set, collectively referred to as a
Job, is made up of several (possibly several hundred) tasks. Map tasks may involve reading a
whole file; they often involve reading only part of a file. By default, the FileInputFormat and its
descendants break a file up into 64 MB chunks (the same size as blocks in HDFS). You can
control this value by setting the mapred.min.split.size parameter in hadoop-site.xml, or by
overriding the parameter in the JobConf object used to submit a particular MapReduce job. By
processing a file in chunks, we allow several map tasks to operate on a single file in parallel. If the
file is very large, this can improve performance significantly through parallelism. Even more
importantly, since the various blocks that make up the file may be spread across several different
nodes in the cluster, it allows tasks to be scheduled on each of these different nodes; the

individual blocks are thus all processed locally, instead of needing to be transferred from one node
to another. Of course, while log files can be processed in this piece-wise fashion, some file
formats are not amenable to chunked processing. By writing a custom InputFormat, you can
control how the file is broken up (or is not broken up) into splits.

6 Comments on “how many blocks the input file occupies?

Leave a Reply

Your email address will not be published. Required fields are marked *