PrepAway - Latest Free Exam Questions & Answers

What should you do?

You have A 20 node Hadoop cluster, with 18 slave nodes and 2 master nodes running HDFS High

Availability (HA). You want to minimize the chance of data loss in your cluster. What should you
do?

PrepAway - Latest Free Exam Questions & Answers

A.
Add another master node to increase the number of nodes running the JournalNode which
increases the number of machines available to HA to create a quorum

B.
Set an HDFS replication factor that provides data redundancy, protecting against node failure

C.
Run a Secondary NameNode on a different master from the NameNode in order to provide
automatic recovery from a NameNode failure.

D.
Run the ResourceManager on a different master from the NameNode in order to load-share
HDFS metadata processing

E.
Configure the cluster’s disk drives with an appropriate fault tolerant RAID level

13 Comments on “What should you do?

  1. b says:

    I don’t think D adds fault tolerance. It just reduces the load on a master node, but that does not need to be necessary at such a small cluster.

    Having more than 2 Journal Nodes, however, adds more fault-tolerance to the NameNode metadata, which is why A should be correct.




    0



    0
    1. chris gang says:

      I agree with A is the correct answer, “Add another master node to increase the number of nodes running the JournalNode which increases the number of machines available to HA to create a quorum” , however here it shouldn’t be another master node , what is master node ? I have no idea , we only have name node or data node , if we change master node into journalnode , that will be perfect answer.




      0



      0
  2. Chris says:

    I think B is the only answer that makes any sense. Read the question carefully, because you can have HA without setting the proper data replication factor. And data replication is directly related to potential data loss. Having a ResourceManager only relates to YARN functionality and is required for HA anyway.




    0



    0
  3. chris gang says:

    I think E is the best answer , with even namenode HA configured ,you still worry about the risk of losing data, the only way is to use RAID .




    0



    0
    1. Emmanuel says:

      from hadoop https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

      “Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. This also allows one who is reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled to reuse the hardware which they had previously dedicated to the Secondary NameNode.”




      0



      0
  4. Pramod Kumar says:

    As per hadoop documentation maximum two name nodes can be configured.
    https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html

    dfs.ha.namenodes.[nameservice ID] – unique identifiers for each NameNode in the nameservice

    Configure with a list of comma-separated NameNode IDs. This will be used by DataNodes to determine all the NameNodes in the cluster. For example, if you used “mycluster” as the nameservice ID previously, and you wanted to use “nn1” and “nn2” as the individual IDs of the NameNodes, you would configure this as such:

    dfs.ha.namenodes.mycluster
    nn1,nn2

    Note: Currently, only a maximum of two NameNodes may be configured per nameservice.

    Hence “A” is not a valid option.




    0



    0

Leave a Reply

Your email address will not be published. Required fields are marked *