PrepAway - Latest Free Exam Questions & Answers

How should you proceed?

You are given 10, 000, 000 user profile pages of anonline dating site in XML files, and they are stored in
HDFS. You are assigned to divide the users into groups based on the content of their profiles. You have been
instructed to try K-means clustering on this data. How should you proceed?

PrepAway - Latest Free Exam Questions & Answers

A.
Run MapReduce to transform the data,and find relevant key value pairs.

B.
Divide the data into sets of 1,000 user profiles,and run K-means clustering in RHadoop iteratively.

C.
Run a Naive Bayes classification as a pre-processing step in HDFS.

D.
Partition the data by XML file size,and run K-means clustering in each partition.


Leave a Reply