Which method should the data scientist try first?
A data scientist is asked to implement an article recommendation feature for an on-line magazine. The
magazine does not want to use client tracking technologies such as cookies or reading history. Therefore, only
the style and subject matter of the current articleis available for making recommendations. All of the
magazine’s articles are stored in a database in a format suitable for analytics.
Which method should the data scientist try first?
How are window functions different from regular aggregate functions?
How are window functions different from regular aggregate functions?
What is the confidence of the rule (hat, scarf) -> gloves?
Consider these itemsets:
(hat, scarf, coat) (hat, scarf, coat, gloves)
(hat, scarf, gloves)
(hat, gloves)
(scarf, coat, gloves)
What is the confidence of the rule (hat, scarf) -> gloves?
what is the purpose of the Map Function?
In the MapReduce framework, what is the purpose of the Map Function?
What should you deliver to the production team, along with your commented code?
You have completed your model and are handing it off to be deployed in production. What should you deliver to
the production team, along with your commented code?
Which tool would you recommend to this colleague?
While having a discussion with your colleague, thisperson mentions that they want to perform K-means
clustering on text file data stored in HDFS.
Which tool would you recommend to this colleague?
Which method is used to solve for coefficients b0, b1, .., bn in your linear regression model ?
Which method is used to solve for coefficients b0, b1, .., bn in your linear regression model ?
Y = b0 + b1x1+b2x2+….+bnxn
What describes a true limitation of Logistic Regression method?
What describes a true limitation of Logistic Regression method?
What should you do?
You submit a MapReduce job to a Hadoop cluster and notice that although the job was successfully submitted,
it is not completing. What should you do?
Which action should the team recommend?
A disk drive manufacturer has a defect rate of lessthan 1.5% with 98% confidence. A quality assuranceteam
samples 1000 disk drives and finds 14 defective units. Which action should the team recommend?