A new data structure for accurate on-line accumulation of rank-based
statistics such as quantiles and trimmed means. . See original paper:
"Computing extremely accurate quantiles using t-digest" by Ted Dunning
and Otmar Ertl for more details
https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf.
Examples
>>> quantile 0.99 (tdigest [1..1000] :: TDigest 25)
Just 990.5
>>> quantile 0.99 (tdigest [1..1000] :: TDigest 3)
Just 989.0...
t-Digest is more precise in tails, especially median is imprecise:
>>> median (forceCompress $ tdigest [1..1000] :: TDigest 25)
Just 497.6...
Semigroup
This operation isn't strictly associative, but statistical variables
shouldn't be affected.
>>> let td xs = tdigest xs :: TDigest 10
>>> median (td [1..500] <> (td [501..1000] <> td [1001..1500]))
Just 802...
>>> median ((td [1..500] <> td [501..1000]) <> td [1001..1500])
Just 726...
The linear is worst-case scenario:
>>> let td' xs = tdigest (fairshuffle xs) :: TDigest 10
>>> median (td' [1..500] <> (td' [501..1000] <> td' [1001..1500]))
Just 750.3789...
>>> median ((td' [1..500] <> td' [501..1000]) <> td' [1001..1500])
Just 750.3789...