You are here: Home > Briefing Cloudera Knowledge > DS-200

PrepAway - Latest Free Exam Questions & Answers

Category: DS-200

Exam DS-200: Data Science Essentials

What metric should you use to estimate how hard a particular bug is to fix?

seenagapeOctober 15, 2014 Leave a comment

A company has 20 software engineers working to fix on a project. Over the past week, the team
has fixed 100 bugs. Although the average number of bugs. Although the average number of bugs
fixed per engineer id five. None of the engineer fixed exactly five bugs last week.
One engineer points out that some bugs are more difficult to fix than others. What metric should
you use to estimate how hard a particular bug is to fix?

what way can Hadoop be used to improve the performance of LIoyd’s algorithm for k-means clustering on large

seenagapeOctober 15, 2014 Leave a comment

In what way can Hadoop be used to improve the performance of LIoyd’s algorithm for k-means

clustering on large data sets?

which set of mappers and reducers in the below pseudo code snippets will solve for the mean number of messages

seenagapeOctober 15, 2014 One comment

You have a data file that contains two trillion records, one record per line (comma separated).
Each record lists two friends and unique message sent between them. Their names will not have
commas.
Michael, John, Pabst, Blue Ribbon
Tiffany, James, BMX Racing
John, Michael, Natural Lemon Flavor
Analyze the pseudo code examples below and determine which set of mappers and reducers in
the below pseudo code snippets will solve for the mean number of messages each user sends to
all of the friends?
For example pseudo code may have three friends to whom he sends 6, 10, and 200 messages,
respectively, so Michael’s mean would be (6+10+200)/3. The solution may require a pipeline of
two MapReduce jobs.

Which command gathers these records into a single file on your local file system?

seenagapeOctober 15, 2014 One comment

You have just run a MapReduce job to filter user messages to only those of a selected
geographical region. The output for this job in a directory named westUsers, located just below
your home directory in HDFS. Which command gathers these records into a single file on your
local file system?

Which two functions are convex?

seenagapeOctober 15, 2014 One comment

Function is convex if the line segment between two points, a and b is greater than equal to the
value of the a x b

Which two functions are convex?

Which data serialization system gives you the flexibility to do this?

seenagapeOctober 15, 2014 One comment

You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25
KB. Because your Hadoop cluster isn’t optimized for storing and processing many small files you
decide to do the following actions:
1. Group the individual images into a set of larger files
2. Use the set of larger files as input for a MapReduce job that processes them directly with
Python using Hadoop streaming
Which data serialization system gives you the flexibility to do this?

What is the best way to acquire the user profile for use in HDFS?

seenagapeOctober 15, 2014 Leave a comment

You have user profile records in an OLTP database that you want to join with web server logs
which you have already ingested into HDFS. What is the best way to acquire the user profile for
use in HDFS?

You need to build a system to detect if the total dollar value of sales are outside the norm for each U.S

seenagapeOctober 15, 2014 Leave a comment

You are building a system to perform outlier detection for a large online retailer. You need to build
a system to detect if the total dollar value of sales are outside the norm for each U.S. state, as
determined from the physical location of the buyer for each purchase.
The retailer’s data sources are scattered across multiple systems and databases and are
unorganized with little coordination or shared data or keys between the various data sources.
Below are the sources of data available to you. Determine which three will give you the smallest
set of data sources but still allow you to implement the outlier detector by state.

How can the naiveté of the naive Bayes classifier be advantageous?

seenagapeOctober 15, 2014 Leave a comment

How can the naiveté of the naive Bayes classifier be advantageous?

What are two defining features of RMSE (root-mean square error or root-mean-square deviation)?

seenagapeOctober 15, 2014 Leave a comment

What are two defining features of RMSE (root-mean square error or root-mean-square deviation)?

Page 4 of 6« First «...2 345 6 »

Move Up

Get 50% Discount on All Your Purchases
at PrepAway.com - Latest Exam Questions

This is ONE TIME OFFER

Enter your email address to receive your 50% off dicount code:

SPECIAL OFFER: GET 50% OFF

Use Discount Code:

Briefing Cloudera Knowledge

Free Cloudera Study Guide

Category: DS-200

What metric should you use to estimate how hard a particular bug is to fix?

what way can Hadoop be used to improve the performance of LIoyd’s algorithm for k-means clustering on large

which set of mappers and reducers in the below pseudo code snippets will solve for the mean number of messages

Which command gathers these records into a single file on your local file system?

Which two functions are convex?

Which data serialization system gives you the flexibility to do this?

What is the best way to acquire the user profile for use in HDFS?

You need to build a system to detect if the total dollar value of sales are outside the norm for each U.S

How can the naiveté of the naive Bayes classifier be advantageous?

What are two defining features of RMSE (root-mean square error or root-mean-square deviation)?