Countish implements two approximate counting algorithms outlined in “Approximate Frequency Counts over Data Streams”.
http://www.vldb.org/conf/2002/S10P03.pdf
Have you ever needed to do something like calculate the top URLs or top ips from an infinite stream? This package provides probabalistic frequency counters, with accuracy guarantees and low memory usage.
countish provides an extremely simple interface consisting of an “Observe” method and a “ItemsAboveThreshold” method.
Example:
cat urls.txt | go run ./cmd/countish/main.go -threshold .3
0.428671 /
3 counting implementations are provided.
- Naive: exact counts are held in a map
- Lossy: corresponding to “lossy counting”
- Sticky: corresponding to “sticky sampling”
counter := countish.NewLossyCounter(.01, .01)
for i:=0;i<100;i++ {
counter.Observe("value")
}
counter.Observe("another value")
// print all items which *might* occur more than 90% of the time,
// along with their estimated frequency
entries := counter.ItemsAboveThreshold(.9)
fmt.Println(entries)
[{value 1.00009900990099}]