Which format should you use to store this data in HDFS?
You want to perform analysis on a large collection of images. You want to store this data in HDFS
and process it with MapReduce but you also want to give your data analysts and data scientists
the ability to process the data directly from HDFS with an interpreted high-level programming
language like Python. Which format should you use to store this data in HDFS?
Which mode of operation in Hadoop allows you to most closely simulate a production cluster while using a sing
You want to run Hadoop jobs on your development workstation for testing before you submit them
to your production cluster. Which mode of operation in Hadoop allows you to most closely simulate
a production cluster while using a single machine?
how many Mappers will run?
Your cluster’s HDFS block size in 64MB. You have directory containing 100 plain text files, each of
which is 100MB in size. The InputFormat for your job is TextInputFormat. Determine how many
Mappers will run?
Which of the following best describes the workings of TextInputFormat?
Which of the following best describes the workings of TextInputFormat?
Which of the following statements most accurately describes the relationship between MapReduce and Pig?
Which of the following statements most accurately describes the relationship between MapReduce
and Pig?
Which of the following tools should you use to accomplish this?
You need to import a portion of a relational database every day as files to HDFS, and generate
Java classes to Interact with your imported data. Which of the following tools should you use to
accomplish this?
Which of the following is a data warehousing software built on top of Apache Hadoop that defines a simple SQL-
You have an employee who is a Date Analyst and is very comfortable with SQL. He would like to
run ad-hoc analysis on data in your HDFS duster. Which of the following is a data warehousing
software built on top of Apache Hadoop that defines a simple SQL-like query language well-suited
for this kind of user?
What is the preferred way to pass a small number of configuration parameters to a mapper or reducer?
What is the preferred way to pass a small number of configuration parameters to a mapper or
reducer?
which is the correct way of submitting the job to the cluster?
Given a Mapper, Reducer, and Driver class packaged into a jar, which is the correct way of
submitting the job to the cluster?
What is the difference between a failed task attempt and a killed task attempt?
What is the difference between a failed task attempt and a killed task attempt?