How does the NameNode detect that a DataNode has failed?
The NameNode does not need to know that a DataNode has failed.
When the NameNode fails to receive periodic heartbeats from the DataNode, it considers the
DataNode as failed.
The NameNode periodically pings the datanode. If the DataNode does not respond, the
NameNode considers the DataNode as failed.
When HDFS starts up, the NameNode tries to communicate with the DataNode and considers
the DataNode as failed if it does not respond.
NameNode periodically receives a Heartbeat and a Blockreport from each of the
DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly.
A Blockreport contains a list of all blocks on a DataNode. When NameNode notices that it has not
recieved a hearbeat message from a data node after a certain amount of time, the data node is
marked as dead. Since blocks will be under replicated the system begins replicating the blocks
that were stored on the dead datanode. The NameNode Orchestrates the replication of data
blocks from one datanode to another. The replication data transfer happens directly between
datanodes and the data never passes through the namenode.
Reference:24 Interview Questions & Answers for Hadoop MapReduce developers,How
NameNode Handles data node failures?