How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
You observe that the number of spilled records from Map tasks far exceeds the number of
map output records. Your child heap size is 1GB and your io.sort.mb value is set to 100 MB.
How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
What should you do?
You want a node to only swap Hadoop daemon data from RAM to disk when absolutely
necessary. What should you do?
Which is the default scheduler in YARN?
Which is the default scheduler in YARN?
What should you do?
Each node in your Hadoop cluster, running YARN, has 64 GB memory and 24 cores. Your
yarn-site.xml has the following configuration: <property>
<name>yarn.nodemanager.resource.memory-mb</name> <value>32768</value>
</property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name>
<value>23</value> </property> You want YARN to launch no more than 16 containers per
node. What should you do?
What results?
Your Hadoop cluster contains nodes in three racks. You have NOT configured the dfs.hosts
property in the NameNode’s configuration file. What results?
How many Mappers will run?
On a cluster running MapReduce v2 (MRv2) on YARN, a MapReduce job is given a
directory of 10 plain text as its input directory. Each file is made up of 3 HDFS blocks. How
many Mappers will run?
Which ecosystem project should you use to perform these actions?
You are working on a project where you need to chain together MapReduce, Pig jobs. You
also needs the ability to use forks, decision, and path joins. Which ecosystem project should
you use to perform these actions?
What processes must you do if you are running a Hadoop cluster with a single NameNode and six DataNodes…
What processes must you do if you are running a Hadoop cluster with a single NameNode
and six DataNodes, and you want to change a configuration parameter so that it affects all
six DataNodes.
Identify two features/issues that YARN is designed to address:
Identify two features/issues that YARN is designed to address:
How does this alter HDFS block storage?
A slave node in your cluster has four 2TB hard drives installed (4 x 2TB). The DataNode is
configured to store HDFS blocks on the disks. You set the value of the
dfs.datanode.du.reserved parameter to 100GB. How does this alter HDFS block storage?