You are here: Home > Briefing Cloudera Knowledge > DS-200

PrepAway - Latest Free Exam Questions & Answers

Category: DS-200

Exam DS-200: Data Science Essentials

Which is the best cut point for X if you want to discretize these values into two buckets in a way that minimi

seenagapeOctober 15, 2014 Leave a comment

Consider the following sample from a distribution that contains a continuous X and label Y that is
either A or B:

Which is the best cut point for X if you want to discretize these values into two buckets in a way
that minimizes the sum of chi-square values?

Which is the best choice of cut points for X if you want to discretize these values into three buckets that mi

seenagapeOctober 15, 2014 Leave a comment

Consider the following sample from a distribution that contains a continuous X and label Y that is
either A or B:

Which is the best choice of cut points for X if you want to discretize these values into three buckets
that minimizes the sum of chi-square values?

Which is the most efficient process to gather these web servers access logs into your Hadoop cluster for analy

seenagapeOctober 15, 2014 Leave a comment

You want to understand more about how users browse your public website. For example, you war
know which pages they visit prior to placing an order. You have a server farm of 200 web server
hosting your website. Which is the most efficient process to gather these web servers access logs
into your Hadoop cluster for analysis?

Which method will have the best runtime performance?

seenagapeOctober 15, 2014 Leave a comment

You have a large file of N records (one per line), and want to randomly sample 10% them. You
have two functions that are perfect random number generators (through they are a bit slow):
Random_uniform () generates a uniformly distributed number in the interval [0, 1]
random_permotation (M) generates a random permutation of the number O through M -1.
Below are three different functions that implement the sampling.
Method A
For line in file:
If random_uniform () < 0.1;
Print line
Method B
i = 0
for line in file:
if i % 10 = = 0;
print line
i += 1
Method C
idxs = random_permotation (N) [: (N/10)]
i = 0
for line in file:
if i in idxs:
print line
i +=1

Which method will have the best runtime performance?

Which method requires the most RAM?

seenagapeOctober 15, 2014 Leave a comment

for line in file:
if i in idxs:
print line
i +=1
Which method requires the most RAM?

Which method might introduce unexpected correlations?

seenagapeOctober 15, 2014 Leave a comment

i += 1
Method C
idxs = random_permotation (N) [: (N/10)]
i = 0
for line in file:
if i in idxs:
print line
i +=1
Which method might introduce unexpected correlations?

Which method is least likely to give you exactly 10% of your data?

seenagapeOctober 15, 2014 Leave a comment

i = 0
for line in file:
if i % 10 = = 0;
print line
i += 1
Method C
idxs = random_permotation (N) [: (N/10)]
i = 0
for line in file:
if i in idxs:
print line
i +=1
Which method is least likely to give you exactly 10% of your data?

what would we expect the value of the revenue to be in Q1 of 2013?

seenagapeOctober 15, 2014 One comment

Assuming the trends shown in this chart continue, what would we expect the value of the revenue to be in Q1 of 2013?

what is the probability that they took cloudera’s introduction to Data Science: Building Recommender Systems

seenagapeOctober 15, 2014 One comment

From historical data, you know that 50% of students who take Cloudera’s Introduction to Data
Science: Building Recommenders Systems training course pass this exam, while only 25% of
students who did not take the training course pass this exam. You also know that 50% of this
exam’s candidates also take Cloudera’s Introduction to Data Science: Building Recommendations
Systems training course.
If we know that a person has passed this exam, what is the probability that they took cloudera’s
introduction to Data Science: Building Recommender Systems training course?

What is the probability that any individual exam candidate will pass the data science exam?

seenagapeOctober 15, 2014 One comment

From historical data, you know that 50% of students who take Cloudera’s Introduction to Data
Science: Building Recommenders Systems training course pass this exam, while only 25% of
students who did not take the training course pass this exam. You also know that 50% of this
exam’s candidates also take Cloudera’s Introduction to Data Science: Building Recommendations
Systems training course.
What is the probability that any individual exam candidate will pass the data science exam?

Page 5 of 6« First «...2 3 456 »

Move Up

Get 50% Discount on All Your Purchases
at PrepAway.com - Latest Exam Questions

This is ONE TIME OFFER

Enter your email address to receive your 50% off dicount code:

SPECIAL OFFER: GET 50% OFF

Use Discount Code:

Briefing Cloudera Knowledge

Free Cloudera Study Guide

Category: DS-200

Which is the best cut point for X if you want to discretize these values into two buckets in a way that minimi

Which is the best choice of cut points for X if you want to discretize these values into three buckets that mi

Which is the most efficient process to gather these web servers access logs into your Hadoop cluster for analy

Which method will have the best runtime performance?

Which method requires the most RAM?

Which method might introduce unexpected correlations?

Which method is least likely to give you exactly 10% of your data?

what would we expect the value of the revenue to be in Q1 of 2013?

what is the probability that they took cloudera’s introduction to Data Science: Building Recommender Systems

What is the probability that any individual exam candidate will pass the data science exam?