PrepAway - Latest Free Exam Questions & Answers

why should you run the HDFS balancer periodically?

Choose three reasons why should you run the HDFS balancer periodically?

PrepAway - Latest Free Exam Questions & Answers

A.
To improve data locality for MapReduce tasks.

B.
To ensure that there is consistent disk utilization across the DataNodes.

C.
To ensure that there is capacity in HDFS tor additional data.

D.
To ensure that all blocks in the cluster are 128MB in size.

E.
To help HDFS deliver consistent performance under heavy loads.

Explanation:
The balancer is a tool that balances disk space usage on an HDFS cluster when
some datanodes become full or when new empty nodes join the cluster. The tool is deployed as
an application program that can be run by the cluster administrator on a live HDFS cluster while
applications adding and deleting files.
DESCRIPTION
The threshold parameter is a fraction in the range of (0%, 100%) with a default value of 10%. The
threshold sets a target for whether the cluster is balanced. A cluster is balanced if for each
datanode, the utilization of the node (ratio of used space at the node to total capacity of the node)
differs from the utilization of the (ratio of used space in the cluster to total capacity of the cluster)
by no more than the threshold value. The smaller the threshold, the more balanced a cluster will

become. It takes more time to run the balancer for small threshold values. Also for a very small
threshold the cluster may not be able to reach the balanced state when applications write and
delete files concurrently.
The tool moves blocks from highly utilized datanodes to poorly utilized datanodes iteratively. In
each iteration a datanode moves or receives no more than the lesser of 10G bytes or the
threshold fraction of its capacity. Each iteration runs no more than 20 minutes. At the end of each
iteration, the balancer obtains updated datanodes information from the namenode.
org.apache.hadoop.hdfs.server.balancer, Class Balancer

One Comment on “why should you run the HDFS balancer periodically?


Leave a Reply

Your email address will not be published. Required fields are marked *