How do high-level languages like Apache Hive and Apache Pig efficiently calculate approximately perc

seenagape

11 years ago

Given the following sample of numbers from a distribution:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
How do high-level languages like Apache Hive and Apache Pig efficiently calculate approximately
percentiles for a distribution?

A.
They sort all of the input samples and the lookup the samples for each percentile

B.
They maintain index of input data as it is loaded into HDFS and load them into memory

C.
They use pivots to assign each observations to the reducer that calculate each percentile

D.
They assign sample observations to buckets and then aggregate the buckets to compute the
approximations