How will the Fair Scheduler handle these two jobs?
You have a cluster running with the Fair Scheduler enabled. There are currently no jobs
running on the cluster, and you submit a job A, so that only job A is running on the cluster. A
while later, you submit Job B. now job A and Job B are running on the cluster at the same
time. How will the Fair Scheduler handle these two jobs?
How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
You observe that the number of spilled records from Map tasks far exceeds the number of
map output records. Your child heap size is 1GB and your io.sort.mb value is set to 100 MB.
How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
What should you do?
You want a node to only swap Hadoop daemon data from RAM to disk when absolutely
necessary. What should you do?
Which is the default scheduler in YARN?
Which is the default scheduler in YARN?
What should you do?
Each node in your Hadoop cluster, running YARN, has 64 GB memory and 24 cores. Your
yarn-site.xml has the following configuration: <property>
<name>yarn.nodemanager.resource.memory-mb</name> <value>32768</value>
</property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name>
<value>23</value> </property> You want YARN to launch no more than 16 containers per
node. What should you do?
What results?
Your Hadoop cluster contains nodes in three racks. You have NOT configured the dfs.hosts
property in the NameNode’s configuration file. What results?
How many Mappers will run?
On a cluster running MapReduce v2 (MRv2) on YARN, a MapReduce job is given a
directory of 10 plain text as its input directory. Each file is made up of 3 HDFS blocks. How
many Mappers will run?
Which ecosystem project should you use to perform these actions?
You are working on a project where you need to chain together MapReduce, Pig jobs. You
also needs the ability to use forks, decision, and path joins. Which ecosystem project should
you use to perform these actions?
Identify two features/issues that YARN is designed to address:
Identify two features/issues that YARN is designed to address:
What processes must you do if you are running a Hadoop cluster with a single NameNode and six DataNodes…
What processes must you do if you are running a Hadoop cluster with a single NameNode
and six DataNodes, and you want to change a configuration parameter so that it affects all
six DataNodes.