which two resources should you expect to be bottlenecks?
You need to create a job that does frequency analysis on input data. You will do this by writing a
Mapper that uses TextInputFormat and splits each value (a line of text from an input file) into
individual characters. For each one of these characters, you will emit the character as a key and
an InputWritable as the value. As this will produce proportionally more intermediate data than input
data, which two resources should you expect to be bottlenecks?
Identify the Hadoop daemon on which the Hadoop framework will look for an available slot schedule a MapReduce
Your client application submits a MapReduce job to your Hadoop cluster. Identify the Hadoop
daemon on which the Hadoop framework will look for an available slot schedule a MapReduce
operation.
Will you be able to reuse your existing Reduces as your combiner in this case and why or why not?
You want to count the number of occurrences for each unique word in the supplied input data.
You’ve decided to implement this by having your mapper tokenize each word and emit a literal
value 1, and then have your reducer increment a counter for each literal 1 it receives. After
successful implementing this, it occurs to you that you could optimize this by specifying a
combiner. Will you be able to reuse your existing Reduces as your combiner in this case and why
or why not?
Which project gives you a distributed, Scalable, data store that allows you random, realtime read/write access
Which project gives you a distributed, Scalable, data store that allows you random, realtime
read/write access to hundreds of terabytes of data?
what would another user see when trying to access this life?
You use the hadoop fs –put command to write a 300 MB file using and HDFS block size of 64 MB.
Just after this command has finished writing 200 MB of this file, what would another user see
when trying to access this life?
Identify the tool best suited to import a portion of a relational database every day as files into HDFS, and g
Identify the tool best suited to import a portion of a relational database every day as files into
HDFS, and generate Java classes to interact with that imported data?
How many files will be processed by the FileInputFormat.setInputPaths () command when it’s given a path
You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt
and #data.txt. How many files will be processed by the FileInputFormat.setInputPaths () command
when it’s given a path object representing this directory?
Determine the difference between setting the number of reduces to one and settings the number of reducers to z
You write MapReduce job to process 100 files in HDFS. Your MapReduce algorithm uses
TextInputFormat: the mapper applies a regular expression over input values and emits key-values
pairs with the key consisting of the matching text, and the value containing the filename and byte
offset. Determine the difference between setting the number of reduces to one and settings the
number of reducers to zero.
A combiner reduces:
A combiner reduces:
how many map task attempts will there be?
In a MapReduce job with 500 map tasks, how many map task attempts will there be?