why should you run the HDFS balancer periodically?
Choose three reasons why should you run the HDFS balancer periodically?
What occurs when you execute the command: hdfs haadmin –failover nn01 nn02?
Your cluster implements HDFS High Availability (HA). Your two NameNodes are named
nn01 and nn02. What occurs when you execute the command: hdfs haadmin –failover nn01
nn02?
you need to do in order to run Impala on the cluster and submit jobs from the command line of the gateway mach
You have a Hadoop cluster HDFS, and a gateway machine external to the cluster from
which clients submit jobs. What do you need to do in order to run Impala on the cluster and
submit jobs from the command line of the gateway machine?
Which command gathers these into a single file on your local file system?
You have just run a MapReduce job to filter user messages to only those of a selected
geographical region. The output for this job is in a directory named westUsers, located just
below your home directory in HDFS. Which command gathers these into a single file on
your local file system?
which file contains a serialized form of all the directory and files inodes in the filesystem, giving the Name
In CDH4 and later, which file contains a serialized form of all the directory and files inodes in
the filesystem, giving the NameNode a persistent checkpoint of the filesystem metadata?
What are two ways to determine available HDFS space in your cluster?
You are running a Hadoop cluster with a NameNode on host mynamenode. What are two
ways to determine available HDFS space in your cluster?
Which method should you tell that developers to implement?
You have recently converted your Hadoop cluster from a MapReduce 1 (MRv1) architecture
to MapReduce 2 (MRv2) on YARN architecture. Your developers are accustomed to
specifying map and reduce tasks (resource allocation) tasks when they run jobs: A
developer wants to know how specify to reduce tasks when a specific job runs. Which
method should you tell that developers to implement?
What results?
Your Hadoop cluster contains nodes in three racks. You have not configured the dfs.hosts
property in the NameNode’s configuration file. What results?
how do you increase JVM heap size property to 3GB to optimize performance?
You are running a Hadoop cluster with MapReduce version 2 (MRv2) on YARN. You
consistently see that MapReduce map tasks on your cluster are running slowly because of
excessive garbage collection of JVM, how do you increase JVM heap size property to 3GB
to optimize performance?
Which two best describes how FIFO Scheduler arbitrates the cluster resources for job and its tasks?
You have a cluster running with a FIFO scheduler enabled. You submit a large job A to the
cluster, which you expect to run for one hour. Then, you submit job B to the cluster, which
you expect to run a couple of minutes only. You submit both jobs with the same priority.
Which two best describes how FIFO Scheduler arbitrates the cluster resources for job and
its tasks?