In a MapReduce job, you want each of you input files processed by a single map task. How do you
configure a MapReduce job so that a single map task processes each input file regardless of how
many blocks the input file occupies?
Given a directory of files with the following structure: line number, tab character, string:
2. kadf jhuwqounahagtnbvaswslmnbfgy
You want to send each line as one record to your Mapper. Which InputFormat would you use to
complete the line: setInputFormat (________.class);
What is a SequenceFile?
You have an employee who is a Date Analyst and is very comfortable with SQL. He would like to
run ad-hoc analysis on data in your HDFS duster. Which of the following is a data warehousing
software built on top of Apache Hadoop that defines a simple SQL-like query language well-suited
for this kind of user?
Which of the following best describes the workings of TextInputFormat?
Workflows expressed in Oozie can contain:
You need to import a portion of a relational database every day as files to HDFS, and generate
Java classes to Interact with your imported data. Which of the following tools should you use to
Which of the following statements most accurately describes the relationship between MapReduce
What is the preferred way to pass a small number of configuration parameters to a mapper or
You need a distributed, scalable, data Store that allows you random, realtime read/write access to
hundreds of terabytes of data. Which of the following would you use?