What should you do?
You want a node to only swap Hadoop daemon data from RAM to disk when absolutely
necessary. What should you do?
Which is the default scheduler in YARN?
Which is the default scheduler in YARN?
where does the Mapper place the intermediate data of each Map Task?
During the execution of a MapReduce v2 (MRv2) job on YARN, where does the Mapper place the
intermediate data of each Map Task?
What should you do?
Each node in your Hadoop cluster, running YARN, has 64 GB memory and 24 cores. Your yarnsite.xml has the following configuration:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>32768</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>23</value>
</property>
You want YARN to launch no more than 16 containers per node. What should you do?
Which Linux commands help you to identify whether swapping is occurring?
You suspect that your NameNode is incorrectly configured, and is swapping memory to disk.
Which Linux commands help you to identify whether swapping is occurring?
What results?
Your Hadoop cluster contains nodes in three racks. You have NOT configured the dfs.hosts
property in the NameNode’s configuration file. What results?
what would another use see when they look in directory?
On a cluster running CDH 5.0 or above, you use the hadoop fs –put command to write a 300MB
file into a previously empty directory using an HDFS block size of 64 MB. Just after this command
has finished writing 200 MB of this file, what would another use see when they look in directory?
How many Mappers will run?
On a cluster running MapReduce v2 (MRv2) on YARN, a MapReduce job is given a directory of 10
plain text as its input directory. Each file is made up of 3 HDFS blocks. How many Mappers will
run?
Which command does Hadoop offer to discover missing or corrupt HDFS data?
Which command does Hadoop offer to discover missing or corrupt HDFS data?
Which ecosystem project should you use to perform these actions?
You are working on a project where you need to chain together MapReduce, Pig jobs. You also
needs the ability to use forks, decision, and path joins. Which ecosystem project should you use to
perform these actions?