Table schemas in Hive are:
Table schemas in Hive are:
Where are Hadoop task log files stored?
For each YARN job, the Hadoop framework generates task log file. Where are Hadoop task log
files stored?
How will the Fair Scheduler handle these two jobs?
You have a cluster running with the fair Scheduler enabled. There are currently no jobs running on
the cluster, and you submit a job A, so that only job A is running on the cluster. A while later, you
submit Job B. now Job A and Job B are running on the cluster at the same time. How will the Fair
Scheduler handle these two jobs?
What should you do?
Each node in your Hadoop cluster, running YARN, has 64GB memory and 24 cores. Your
yarn.site.xml has the following configuration:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>32768</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>12</value>
</property>
You want YARN to launch no more than 16 containers per node. What should you do?
What should you do?
You want to node to only swap Hadoop daemon data from RAM to disk when absolutely
necessary. What should you do?
Which two daemons needs to be installed on your cluster’s master nodes?
You are configuring your cluster to run HDFS and MapReducer v2 (MRv2) on YARN. Which two
daemons needs to be installed on your cluster’s master nodes?
How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
You observed that the number of spilled records from Map tasks far exceeds the number of map
output records. Your child heap size is 1GB and your io.sort.mb value is set to 1000MB. How
would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
Which best describes how you determine when the last checkpoint happened?
You are running a Hadoop cluster with a NameNode on host mynamenode, a secondary
NameNode on host mysecondarynamenode and several DataNodes.
Which best describes how you determine when the last checkpoint happened?
What does CDH packaging do on install to facilitate Kerberos security setup?
What does CDH packaging do on install to facilitate Kerberos security setup?
Which is the most efficient process to gather these web server across logs into your Hadoop cluster analysis?
You want to understand more about how users browse your public website. For example, you
want to know which pages they visit prior to placing an order. You have a server farm of 200 web
servers hosting your website. Which is the most efficient process to gather these web server
across logs into your Hadoop cluster analysis?