What is the maximum amount of virtual memory allocated for each map task before YARN will kill its Container?
Your cluster’s mapred-start.xml includes the following parameters
<name>mapreduce.map.memory.mb</name> <value>4096</value>
<name>mapreduce.reduce.memory.mb</name> <value>8192</value> And any cluster’s
yarn-site.xml includes the following parameters
<name>yarn.nodemanager.vmen-pmen-ration</name> <value>2.1</value> What is the
maximum amount of virtual memory allocated for each map task before YARN will kill its
Container?
Table schemas in Hive are:
Table schemas in Hive are:
what is the maximum number of NameNode daemons you should run on your cluster in order to avoid a “split-bra
Assuming you’re not running HDFS Federation, what is the maximum number of
NameNode daemons you should run on your cluster in order to avoid a “split-brain” scenario
with your NameNode when running HDFS High Availability (HA) using Quorum-based
storage?
Where are Hadoop task log files stored?
For each YARN job, the Hadoop framework generates task log file. Where are Hadoop task
log files stored?
How will the Fair Scheduler handle these two jobs?
You have a cluster running with the fair Scheduler enabled. There are currently no jobs
running on the cluster, and you submit a job A, so that only job A is running on the cluster. A
while later, you submit Job B. now Job A and Job B are running on the cluster at the same
time. How will the Fair Scheduler handle these two jobs?
What should you do?
Each node in your Hadoop cluster, running YARN, has 64GB memory and 24 cores. Your
yarn.site.xml has the following configuration: <property>
<name>yarn.nodemanager.resource.memory-mb</name> <value>32768</value>
</property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name>
<value>12</value> </property> You want YARN to launch no more than 16 containers per
node. What should you do?
What should you do?
You want to node to only swap Hadoop daemon data from RAM to disk when absolutely
necessary. What should you do?
Which two daemons needs to be installed on your cluster’s master nodes?
You are configuring your cluster to run HDFS and MapReducer v2 (MRv2) on YARN. Which
two daemons needs to be installed on your cluster’s master nodes?
How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
You observed that the number of spilled records from Map tasks far exceeds the number of
map output records. Your child heap size is 1GB and your io.sort.mb value is set to
1000MB. How would you tune your io.sort.mb value to achieve maximum memory to disk
I/O ratio?
Which best describes how you determine when the last checkpoint happened?
You are running a Hadoop cluster with a NameNode on host my name node, a secondary
NameNode on host my secondary name node and several DataNodes. Which best
describes how you determine when the last checkpoint happened?