Which command gathers these into a single file on your local file system?
You have just run a MapReduce job to filter user messages to only those of a selected
geographical region. The output for this job is in a directory named westUsers, located just
below your home directory in HDFS. Which command gathers these into a single file on
your local file system?
which file contains a serialized form of all the directory and files inodes in the filesystem, giving the Name
In CDH4 and later, which file contains a serialized form of all the directory and files inodes in
the filesystem, giving the NameNode a persistent checkpoint of the filesystem metadata?
What are two ways to determine available HDFS space in your cluster?
You are running a Hadoop cluster with a NameNode on host mynamenode. What are two
ways to determine available HDFS space in your cluster?
Which method should you tell that developers to implement?
You have recently converted your Hadoop cluster from a MapReduce 1 (MRv1) architecture
to MapReduce 2 (MRv2) on YARN architecture. Your developers are accustomed to
specifying map and reduce tasks (resource allocation) tasks when they run jobs: A
developer wants to know how specify to reduce tasks when a specific job runs. Which
method should you tell that developers to implement?
What results?
Your Hadoop cluster contains nodes in three racks. You have not configured the dfs.hosts
property in the NameNode’s configuration file. What results?
how do you increase JVM heap size property to 3GB to optimize performance?
You are running a Hadoop cluster with MapReduce version 2 (MRv2) on YARN. You
consistently see that MapReduce map tasks on your cluster are running slowly because of
excessive garbage collection of JVM, how do you increase JVM heap size property to 3GB
to optimize performance?
Which two best describes how FIFO Scheduler arbitrates the cluster resources for job and its tasks?
You have a cluster running with a FIFO scheduler enabled. You submit a large job A to the
cluster, which you expect to run for one hour. Then, you submit job B to the cluster, which
you expect to run a couple of minutes only. You submit both jobs with the same priority.
Which two best describes how FIFO Scheduler arbitrates the cluster resources for job and
its tasks?
What is the cause of the error?
A user comes to you, complaining that when she attempts to submit a Hadoop job, it fails.
There is a Directory in HDFS named /data/input. The Jar is named j.jar, and the driver class
is named DriverClass. She runs the command: Hadoop jar j.jar DriverClass
/data/input/data/output The error message returned includes the line:
PriviligedActionException as:training (auth:SIMPLE)
cause:org.apache.hadoop.mapreduce.lib.input.invalidInputException: Input path does not
exist: file:/data/input What is the cause of the error?
What is the best way to obtain and ingest these user records?
Your company stores user profile records in an OLTP databases. You want to join these
records with web server logs you have already ingested into the Hadoop file system. What
is the best way to obtain and ingest these user records?
Which two are features of Hadoop’s rack topology?
Which two are features of Hadoop’s rack topology?