What are two benefits of using the five-number summary of sample percentiles to summarize a data set?
Given the following sample of numbers from a distribution:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
What are two benefits of using the five-number summary of sample percentiles to summarize a
data set?
How do high-level languages like Apache Hive and Apache Pig efficiently calculate approximately percentiles fo
Given the following sample of numbers from a distribution:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
How do high-level languages like Apache Hive and Apache Pig efficiently calculate approximately
percentiles for a distribution?
What is the best way to determine the learning rate parameters for stochastic gradient descent when the distri
What is the best way to determine the learning rate parameters for stochastic gradient descent
when the distribution of the input data shifts over time?
Which two machine learning algorithm should you consider as likely to benefit from discretizing continuous fea
Which two machine learning algorithm should you consider as likely to benefit from discretizing
continuous features?
What is the most computationally efficient for computing the expected value?
You’ve built a model that has ten different variables with complicated independence relationships
between them, and both continuous and discrete variables that have complicated, multi-parameter
distributions.
Computing the joint probability distribution is complex, but it turns out that computing the
conditional probabilities for the variables is easy. What is the most computationally efficient for
computing the expected value?
In order to output product recommendations to consumers?
What is one limitation encountered by all systems that employ collaborative filtering and use
preferences as input. In order to output product recommendations to consumers?
Why is the naive Bayes classifier "naive"?
Why is the naive Bayes classifier “naive”?
Which three metrics are useful in measuring the accuracy and quality of a recommender system?
Which three metrics are useful in measuring the accuracy and quality of a recommender system?
What do you have to do on the cluster to allow the worker node to join, and start storing HDFS blocks?
You have installed a cluster running HDFS and MapReduce version 2 (MRv2) on YARN. You have
no afs.hosts entry()ies in your hdfs-alte.xml configuration file. You configure a new worker node by
setting fs.default.name in its configuration files to point to the NameNode on your cluster, and you
start the DataNode daemon on that worker node.
What do you have to do on the cluster to allow the worker node to join, and start storing HDFS
blocks?
What is the maximum amount of virtual memory allocated for each map task before YARN will kill its Container?
Your cluster’s mapred-start.xml includes the following parameters
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value>
And any cluster’s yarn-site.xml includes the following parameters
<name>yarn.nodemanager.vmen-pmen-ration</name>
<value>2.1</value>
What is the maximum amount of virtual memory allocated for each map task before YARN will kill
its Container?