how many blocks the input file occupies?
In a MapReduce job, you want each of you input files processed by a single map task. How do you
configure a MapReduce job so that a single map task processes each input file regardless of how
many blocks the input file occupies?
Which InputFormat would you use to complete the line: setInputFormat (________.class);
Given a directory of files with the following structure: line number, tab character, string:
Example:
1. abialkjfjkaoasdfjksdlkjhqweroij
2. kadf jhuwqounahagtnbvaswslmnbfgy
3. kjfteiomndscxeqalkzhtopedkfslkj
You want to send each line as one record to your Mapper. Which InputFormat would you use to
complete the line: setInputFormat (________.class);
What is a SequenceFile?
What is a SequenceFile?
Which of the following is a data warehousing software built on top of Apache Hadoop that defines a simple SQL-
You have an employee who is a Date Analyst and is very comfortable with SQL. He would like to
run ad-hoc analysis on data in your HDFS duster. Which of the following is a data warehousing
software built on top of Apache Hadoop that defines a simple SQL-like query language well-suited
for this kind of user?
Which of the following best describes the workings of TextInputFormat?
Which of the following best describes the workings of TextInputFormat?
Workflows expressed in Oozie can contain:
Workflows expressed in Oozie can contain:
Which of the following tools should you use to accomplish this?
You need to import a portion of a relational database every day as files to HDFS, and generate
Java classes to Interact with your imported data. Which of the following tools should you use to
accomplish this?
Which of the following statements most accurately describes the relationship between MapReduce and Pig?
Which of the following statements most accurately describes the relationship between MapReduce
and Pig?
What is the preferred way to pass a small number of configuration parameters to a mapper or reducer?
What is the preferred way to pass a small number of configuration parameters to a mapper or
reducer?
Which of the following would you use?
You need a distributed, scalable, data Store that allows you random, realtime read/write access to
hundreds of terabytes of data. Which of the following would you use?