You have a data file that contains two trillion records, one record per line (comma separated).
Each record lists two friends and unique message sent between them. Their names will not have
commas.
Michael, John, Pabst, Blue Ribbon
Tiffany, James, BMX Racing
John, Michael, Natural Lemon Flavor
Analyze the pseudo code examples below and determine which set of mappers and reducers in
the below pseudo code snippets will solve for the mean number of messages each user sends to
all of the friends?
For example pseudo code may have three friends to whom he sends 6, 10, and 200 messages,
respectively, so Michael’s mean would be (6+10+200)/3. The solution may require a pipeline of
two MapReduce jobs.
A.
def mapper1 (line):
key1, key2, message = line.split (‘ , ’)
emit ( (key1, key2) , 1)
def reducer1(key, values):
emit (key, sum(values))
def mapper2(key, value):
key1, key2 = key / / unpack both friends name into separate keys
emit (key1, value)
def reducer2(key, values):
emit (key, mean (values) )
B.
def mapper1 (line):
key1, key2, message = line.split (‘ , ’)
emit ( (key1, key2) , 1)
emit ( (key1, key2) , 1)
def reducer1(key, values):
emit (key, sum(values))
def mapper2(key, value):
key1, key2 = key / / unpack both friends name into separate keys
emit (key1, value)
def reducer2(key, values):
emit (key, mean (values) )
C.
def mapper1 (line):
key1, key2, message = line.split (‘ , ’)
emit ( (key1, key2) , 1)
emit ( (key1, key2) , 1)
def reducer1(key, values):
emit (key, sum(values))
D.
defmapper (line):
Key1, key2, message =line.split(‘ , ’)
Sort (key1, key2) / /a fiven pair will always besorted the same
Emit( ( key 1, key2), 1)
Def reducer1(key, values) :
Emit (key, sum (values) )
Def Mapper2 (key, value)
Key1, key2 = key / / unpack both friends names into separate keys
Emit (key1, value)
Emit(key2, value)
Def reducer2(key, values);
Emit (key, mean (values) )
A
0
0