Which type of join would you use for this table?
You have two tables of customers in your database. Customers in cust_table_1 were sent an e-mail promotion
last year, and customers in cust_table_2 received anewsletter last year. Customers can only be entered in
once per table. You want to create a table that includes all customers, and any of the communications they
received last year. Which type of join would you use for this table?
which lifecycle stage are initial hypotheses formed?
In which lifecycle stage are initial hypotheses formed?
How should you proceed?
You are given 10, 000, 000 user profile pages of anonline dating site in XML files, and they are stored in
HDFS. You are assigned to divide the users into groups based on the content of their profiles. You have been
instructed to try K-means clustering on this data. How should you proceed?
What is the next step you should take?
The Marketing department of your company wishes to track opinion on a new product that was recently
introduced. Marketing would like to know how many positive and negative reviews are appearing over a given
period and potentially retrieve each review for more in-depth insight. They have identified several popular
product review blogs that historically have published thousands of user reviews of your company’s products.
You have been asked to provide the desired analysis. You examine the RSS feeds for each blog and
determine which fields are relevant. You then crafta regular expression to match your new product’s name and
extract the relevant text from each matching review.
What is the next step you should take?
Which word or phrase completes the statement?
Which word or phrase completes the statement? A Data Scientist would consider that a RDBMS is to a Table
as R is to a ______________ .
Which word or phrase completes the statement?
Which word or phrase completes the statement? Unix is to bash as Hadoop is to:
how does the call volume change over time?
A call center for a large electronics company handles an average of 35, 000 support calls a day. The head of
the call center would like to optimize the staffingof the call center during the rollout of a new product due to
recent customer complaints of long wait times. You have been asked to create a model to optimize call center
costs and customer wait times.
The goals for this project include:
1. Relative to the release of a product, how does the call volume change over time?
2. How to best optimize staffing based on the call volume for the newly released product, relative to old
products.
3. Historically, what time of day does the call center need to be most heavily staffed?
4. Determine the frequency of calls by both producttype and customer language.
Which goals are suitable to be completed with MapReduce?
What will be your approach for loading data into the analytical sandbox for this analysis?
Consider the example of an analysis for fraud detection on credit card usage. You will need to ensure higherrisk transactions that may indicate fraudulent credit card activity are retained in your data for analysis, and not
dropped as outliers during pre-processing. What will be your approach for loading data into the analytical
sandbox for this analysis?
What is another component?
Trend, seasonal, and cyclical are components of a time series. What is another component?
Which algorithm is the most appropriate for this study?
You are studying the behavior of a population, and you are provided with multidimensional data at the individual
level. You have identified four specific individuals who are valuable to your study, and would like tofind all
users who are most similar to each individual. Which algorithm is the most appropriate for this study?