PrepAway - Latest Free Exam Questions & Answers

Author: seenagape

which set of mappers and reducers in the below pseudo code snippets will solve for the mean number of messages

You have a data file that contains two trillion records, one record per line (comma separated).
Each record lists two friends and unique message sent between them. Their names will not have
commas.
Michael, John, Pabst, Blue Ribbon
Tiffany, James, BMX Racing
John, Michael, Natural Lemon Flavor
Analyze the pseudo code examples below and determine which set of mappers and reducers in
the below pseudo code snippets will solve for the mean number of messages each user sends to
all of the friends?
For example pseudo code may have three friends to whom he sends 6, 10, and 200 messages,
respectively, so Michael’s mean would be (6+10+200)/3. The solution may require a pipeline of
two MapReduce jobs.

Which data serialization system gives you the flexibility to do this?

You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25
KB. Because your Hadoop cluster isn’t optimized for storing and processing many small files you
decide to do the following actions:
1. Group the individual images into a set of larger files
2. Use the set of larger files as input for a MapReduce job that processes them directly with
Python using Hadoop streaming
Which data serialization system gives you the flexibility to do this?

You need to build a system to detect if the total dollar value of sales are outside the norm for each U.S

You are building a system to perform outlier detection for a large online retailer. You need to build
a system to detect if the total dollar value of sales are outside the norm for each U.S. state, as
determined from the physical location of the buyer for each purchase.
The retailer’s data sources are scattered across multiple systems and databases and are
unorganized with little coordination or shared data or keys between the various data sources.
Below are the sources of data available to you. Determine which three will give you the smallest
set of data sources but still allow you to implement the outlier detector by state.


Page 66 of 71« First...102030...6465666768...Last »