You are planning a Hadoop cluster and considering implementing 10 Gigabit Ethernet as the
network fabric. Which workloads benefit the most from faster network fabric?

A.
When your workload generates a large amount of output data, significantly larger than the
amount of intermediate data
B.
When your workload consumes a large amount of input data, relative to the entire capacity if
HDFS
C.
When your workload consists of processor-intensive tasks
D.
When your workload generates a large amount of intermediate data, on the order of the input
data itself
Answer is “D”
0
0
A seems more correct.
Questions enforces more on Network Fabric not I/O bound which are local.
Large data output means, large data shuffle across network for Reducer.
All other answers points towards local IO.
HTH
0
0
D looks like to be correct:
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
“When we encounter applications that produce large amounts of intermediate data — outputting data on the same order as the amount read in — we recommend two ports on a single Ethernet card or two channel-bonded Ethernet cards to provide 2 Gbps per machine.”
0
0
D
0
0
Cloudera recommends:
Consider 10Gb/sec in the cases:
– Clusters storing very large amounts of data
– Clusters in which typical MapReduce jobs produce large amounts of intermediate data.
please take note that: Intermediate data is transferred across the network to the Reducers
0
0
D
0
0
D. Intermedate data is the bottleneck when considering network in hadoop.
0
0
I have the same idea. A
0
0