Which is the most efficient process to gather these web server logs into your Hadoop cluster for analysis?
You want to understand more about how users browse you public website. For example, you want
to know which pages they visit prior to placing an order. You have a server farm of 200 web
servers hosting your website. Which is the most efficient process to gather these web server logs
into your Hadoop cluster for analysis?
What occurs when you execute the command: hdfs haadmin –failover nn01 nn02
Your cluster implements HDFS High Availability (HA). Your two NameNodes are named nn01 and
nn02. What occurs when you execute the command: hdfs haadmin –failover nn01 nn02
Can you configure a worker node to run a NodeManager daemon but not a DataNode daemon and still have a functio
Your Hadoop cluster is configured with HDFS and MapReduce version 2 (MRv2) on YARN. Can
you configure a worker node to run a NodeManager daemon but not a DataNode daemon and still
have a function cluster?
Which YARN process runs as “controller O” of a submitted job and is responsible for resource requests?
Which YARN process runs as “controller O” of a submitted job and is responsible for resource
requests?
How will the Fair Scheduler handle these two jobs?
You have a cluster running with the Fair Scheduler enabled. There are currently no jobs running
on the cluster, and you submit a job A, so that only job A is running on the cluster. A while later,
you submit Job B. now job A and Job B are running on the cluster at the same time. How will the
Fair Scheduler handle these two jobs?
How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
You observe that the number of spilled records from Map tasks far exceeds the number of map
output records. Your child heap size is 1GB and your io.sort.mb value is set to 100 MB. How
would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
What should you do?
You want a node to only swap Hadoop daemon data from RAM to disk when absolutely
necessary. What should you do?
Which is the default scheduler in YARN?
Which is the default scheduler in YARN?
What should you do?
Each node in your Hadoop cluster, running YARN, has 64 GB memory and 24 cores. Your yarnsite.xml has the following configuration:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>32768</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>23</value>
</property>
You want YARN to launch no more than 16 containers per node. What should you do?
What results?
Your Hadoop cluster contains nodes in three racks. You have NOT configured the dfs.hosts
property in the NameNode’s configuration file. What results?