Which command is guaranteed to produce the desired output if you have more than 20,000 files to process?
You have a directory containing a number of comma-separated files. Each file has three columns
and each filename has a .csv extension. You want to have a single tab-separated file (all .tsv) that
contains all the rows from all the files.
Which command is guaranteed to produce the desired output if you have more than 20,000 files to
process?
What are three benefits of running feature selection analysis before filtering a classification model?
What are three benefits of running feature selection analysis before filtering a classification model?
how frequently should you update your estimate of the gradient?
When optimizing a function using stochastic gradient descent, how frequently should you update
your estimate of the gradient?
what format are web server log files usually generated and how must you transform them in order to make them u
In what format are web server log files usually generated and how must you transform them in
order to make them usable for analysis in Hadoop?
Which recommender system technique is domain specific?
Which recommender system technique is domain specific?
you need to order to sample the complete 100-dimensional unit cube adequately?
You are about to sample a 100-dimensinal unit-cube. To adequately sample any single given
dimension, you need only capture 10 points. How many points do you need to order to sample the
complete 100-dimensional unit cube adequately?
Which process will accomplish all three objectives?
You have acquired a new data source of millions of customer records, and you’ve this data into
HDFS. Prior to analysis, you want to change all customer registration to the same date format,
make all addresses uppercase, and remove all customer names (for anonymization). Which
process will accomplish all three objectives?
What is the best way to visualize the distribution of bug fixes per engineer?
A company has 20 software engineers working to fix on a project. Over the past week, the team
has fixed 100 bugs. Although the average number of bugs. Although the average number of bugs
fixed per engineer id five. None of the engineer fixed exactly five bugs last week.
You want to understand how productive each engineer is at fixing bugs. What is the best way to
visualize the distribution of bug fixes per engineer?
What metric should you use to estimate how hard a particular bug is to fix?
A company has 20 software engineers working to fix on a project. Over the past week, the team
has fixed 100 bugs. Although the average number of bugs. Although the average number of bugs
fixed per engineer id five. None of the engineer fixed exactly five bugs last week.
One engineer points out that some bugs are more difficult to fix than others. What metric should
you use to estimate how hard a particular bug is to fix?
what way can Hadoop be used to improve the performance of LIoyd’s algorithm for k-means clustering on large
In what way can Hadoop be used to improve the performance of LIoyd’s algorithm for k-means
clustering on large data sets?