which best defines a SequenceFile?
Indentify which best defines a SequenceFile?
What determines how the JobTracker assigns each map task to a TaskTracker?
On a cluster running MapReduce v1 (MRv1), a TaskTracker heartbeats into the JobTracker on
your cluster, and alerts the JobTracker it has an open map task slot.
What determines how the JobTracker assigns each map task to a TaskTracker?
Which process describes the lifecycle of a Mapper?
Which process describes the lifecycle of a Mapper?
how many blocks the input file occupies?
In a MapReduce job, you want each of your input files processed by a single map task. How do
you configure a MapReduce job so that a single map task processes each input file regardless of
how many blocks the input file occupies?
which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes
A client application creates an HDFS file named foo.txt with a replication factor of 3. Identify which
best describes the file access rules in HDFS if the file has a single block that is stored on data
nodes A, B and C?
What is the best way to accomplish this?
To process input key-value pairs, your mapper needs to lead a 512 MB data file in memory. What
is the best way to accomplish this?
How many times will the Reducer’s reduce method be invoked?
You have written a Mapper which invokes the following five calls to the OutputColletor.collect
method:
output.collect (new Text (“Apple”), new Text (“Red”) ) ;
output.collect (new Text (“Banana”), new Text (“Yellow”) ) ;
output.collect (new Text (“Apple”), new Text (“Yellow”) ) ;
output.collect (new Text (“Cherry”), new Text (“Red”) ) ;
output.collect (new Text (“Apple”), new Text (“Green”) ) ;
How many times will the Reducer’s reduce method be invoked?
which best describes when the reduce method is first called in a MapReduce job?
Determine which best describes when the reduce method is first called in a MapReduce job?
Identify the Hadoop daemon on which the Hadoop framework will look for an available slot schedule a MapReduce
Your client application submits a MapReduce job to your Hadoop cluster. Identify the Hadoop
daemon on which the Hadoop framework will look for an available slot schedule a MapReduce
operation.
which two resources should you expect to be bottlenecks?
You need to create a job that does frequency analysis on input data. You will do this by writing a
Mapper that uses TextInputFormat and splits each value (a line of text from an input file) into
individual characters. For each one of these characters, you will emit the character as a key and
an InputWritable as the value. As this will produce proportionally more intermediate data than input
data, which two resources should you expect to be bottlenecks?