Which three basic configuration parameters must you set to migrate your cluster from MapReduce
1 (MRv1) to MapReduce V2 (MRv2)?
Which two are Features of Hadoop’s rack topology?
You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25
KB. Because you Hadoop cluster isn’t optimized for storing and processing many small files, you
decide to do the following actions:
1. Group the individual images into a set of larger files
2. Use the set of larger files as input for a MapReduce job that processes them directly with python
using Hadoop streaming.
Which data serialization system gives the flexibility to do this?
Which YARN daemon or service negotiates map and reduce Containers from the Scheduler,
tracking their status and monitoring for progress?
You are configuring a cluster running HDFS, MapReduce version 2 (MRv2) on YARN running
Linux. How must you format the underlying filesystem of each DataNode?
Identify two features/issues that YARN is designated to address:
Your cluster has the following characteristics:
A rack aware topology is configured and on
Replication is not set to 3
Cluster block size is set to 64 MB
Which describes the file read process when a client application connects into the cluster and
requests a 50MB file?
Which YARN daemon or service monitors a Controller’s per-application resource using (e.g.,
On a cluster running CDH 5.0 or above, you use the hadoop fs –put command to write a 300MB
file into a previously empty directory using an HDFS block of 64MB. Just after this command has
finished writing 200MB of this file, what would another use see when they look in the directory?
Which is the default scheduler in YARN?