Briefing Cloudera Knowledge

What is the most likely problem?

You are building a k-nearest neighbor classifier (k-NN) on a labeled set of points in a highdimensional space. You determine that the classifier has a large error on the training data. What is
the most likely problem?

A.
High-dimensional spaces effectively make local neighborhoods global

B.
k-NN compotation does not coverage in high dimensions

C.
k was too small

D.
The VC-dimension of a k-NN classifier is too high