What are two ways to determine available HDFS space in your cluster?
You are running a Hadoop cluster with a NameNode on host mynamenode. What are two ways to
determine available HDFS space in your cluster?
Which method should you tell that developers to implement?
You have recently converted your Hadoop cluster from a MapReduce 1 (MRv1) architecture to
MapReduce 2 (MRv2) on YARN architecture. Your developers are accustomed to specifying map
and reduce tasks (resource allocation) tasks when they run jobs: A developer wants to know how
specify to reduce tasks when a specific job runs. Which method should you tell that developers to
implement?
What results?
Your Hadoop cluster contains nodes in three racks. You have not configured the dfs.hosts
property in the NameNode’s configuration file. What results?
how do you increase JVM heap size property to 3GB to optimize performance?
You are running a Hadoop cluster with MapReduce version 2 (MRv2) on YARN. You consistently
see that MapReduce map tasks on your cluster are running slowly because of excessive garbage
collection of JVM, how do you increase JVM heap size property to 3GB to optimize performance?
Which two best describes how FIFO Scheduler arbitrates the cluster resources for job and its tasks?
You have a cluster running with a FIFO scheduler enabled. You submit a large job A to the cluster,
which you expect to run for one hour. Then, you submit job B to the cluster, which you expect to
run a couple of minutes only.
You submit both jobs with the same priority.
Which two best describes how FIFO Scheduler arbitrates the cluster resources for job and its
tasks?
What is the cause of the error?
A user comes to you, complaining that when she attempts to submit a Hadoop job, it fails. There is
a Directory in HDFS named /data/input. The Jar is named j.jar, and the driver class is named
DriverClass.
She runs the command:
Hadoop jar j.jar DriverClass /data/input/data/output
The error message returned includes the line:
PriviligedActionException as:training (auth:SIMPLE)
cause:org.apache.hadoop.mapreduce.lib.input.invalidInputException:
Input path does not exist: file:/data/input
What is the cause of the error?
What is the best way to obtain and ingest these user records?
Your company stores user profile records in an OLTP databases. You want to join these records
with web server logs you have already ingested into the Hadoop file system. What is the best way
to obtain and ingest these user records?
Which two are features of Hadoop’s rack topology?
Which two are features of Hadoop’s rack topology?
This is called:
The Hadoop framework provides a mechanism for coping with machine issues such as faulty
configuration or impending hardware failure. MapReduce detects that one or a number of
machines are performing poorly and starts more copies of a map or reduce task. All the tasks run
simultaneously and the task finish first are used. This is called:
What is the disadvantage of using multiple reducers with the default HashPartitioner and distributing your wor
What is the disadvantage of using multiple reducers with the default HashPartitioner and
distributing your workload across you cluster?