PrepAway - Latest Free Exam Questions & Answers

Category: DS-200

Exam DS-200: Data Science Essentials

Which is the most efficient process to gather these web servers access logs into your Hadoop cluster for analy

You want to understand more about how users browse your public website. For example, you war
know which pages they visit prior to placing an order. You have a server farm of 200 web server
hosting your website. Which is the most efficient process to gather these web servers access logs
into your Hadoop cluster for analysis?

Which method will have the best runtime performance?

You have a large file of N records (one per line), and want to randomly sample 10% them. You
have two functions that are perfect random number generators (through they are a bit slow):
Random_uniform () generates a uniformly distributed number in the interval [0, 1]
random_permotation (M) generates a random permutation of the number O through M -1.
Below are three different functions that implement the sampling.
Method A
For line in file:
If random_uniform () < 0.1;
Print line
Method B
i = 0
for line in file:
if i % 10 = = 0;
print line
i += 1
Method C
idxs = random_permotation (N) [: (N/10)]
i = 0
for line in file:
if i in idxs:
print line
i +=1

Which method will have the best runtime performance?

Which method requires the most RAM?

You have a large file of N records (one per line), and want to randomly sample 10% them. You
have two functions that are perfect random number generators (through they are a bit slow):
Random_uniform () generates a uniformly distributed number in the interval [0, 1]
random_permotation (M) generates a random permutation of the number O through M -1.
Below are three different functions that implement the sampling.
Method A
For line in file:
If random_uniform () < 0.1;
Print line
Method B
i = 0
for line in file:
if i % 10 = = 0;
print line
i += 1
Method C
idxs = random_permotation (N) [: (N/10)]
i = 0

for line in file:
if i in idxs:
print line
i +=1
Which method requires the most RAM?

Which method might introduce unexpected correlations?

You have a large file of N records (one per line), and want to randomly sample 10% them. You
have two functions that are perfect random number generators (through they are a bit slow):
Random_uniform () generates a uniformly distributed number in the interval [0, 1]
random_permotation (M) generates a random permutation of the number O through M -1.
Below are three different functions that implement the sampling.
Method A
For line in file:
If random_uniform () < 0.1;
Print line
Method B
i = 0
for line in file:
if i % 10 = = 0;
print line

i += 1
Method C
idxs = random_permotation (N) [: (N/10)]
i = 0
for line in file:
if i in idxs:
print line
i +=1
Which method might introduce unexpected correlations?

Which method is least likely to give you exactly 10% of your data?

You have a large file of N records (one per line), and want to randomly sample 10% them. You
have two functions that are perfect random number generators (through they are a bit slow):
Random_uniform () generates a uniformly distributed number in the interval [0, 1]
random_permotation (M) generates a random permutation of the number O through M -1.
Below are three different functions that implement the sampling.
Method A
For line in file:
If random_uniform () < 0.1;
Print line
Method B

i = 0
for line in file:
if i % 10 = = 0;
print line
i += 1
Method C
idxs = random_permotation (N) [: (N/10)]
i = 0
for line in file:
if i in idxs:
print line
i +=1
Which method is least likely to give you exactly 10% of your data?

what is the probability that they took cloudera’s introduction to Data Science: Building Recommender Systems

From historical data, you know that 50% of students who take Cloudera’s Introduction to Data
Science: Building Recommenders Systems training course pass this exam, while only 25% of
students who did not take the training course pass this exam. You also know that 50% of this
exam’s candidates also take Cloudera’s Introduction to Data Science: Building Recommendations
Systems training course.
If we know that a person has passed this exam, what is the probability that they took cloudera’s
introduction to Data Science: Building Recommender Systems training course?

What is the probability that any individual exam candidate will pass the data science exam?

From historical data, you know that 50% of students who take Cloudera’s Introduction to Data
Science: Building Recommenders Systems training course pass this exam, while only 25% of
students who did not take the training course pass this exam. You also know that 50% of this
exam’s candidates also take Cloudera’s Introduction to Data Science: Building Recommendations
Systems training course.
What is the probability that any individual exam candidate will pass the data science exam?


Page 5 of 6« First...23456