Skip to content

shanemhansen/countish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Approximate frequency counts over data streams for Go

Countish implements two approximate counting algorithms outlined in “Approximate Frequency Counts over Data Streams”.

http://www.vldb.org/conf/2002/S10P03.pdf

Use cases

Have you ever needed to do something like calculate the top URLs or top ips from an infinite stream? This package provides probabalistic frequency counters, with accuracy guarantees and low memory usage.

countish provides an extremely simple interface consisting of an “Observe” method and a “ItemsAboveThreshold” method.

Example:

cat urls.txt | go run ./cmd/countish/main.go -threshold .3
0.428671 /

3 counting implementations are provided.

  1. Naive: exact counts are held in a map
  2. Lossy: corresponding to “lossy counting”
  3. Sticky: corresponding to “sticky sampling”

Example:

counter := countish.NewLossyCounter(.01, .01)
for i:=0;i<100;i++ {
    counter.Observe("value")
}
counter.Observe("another value")
// print all items which *might* occur more than 90% of the time,
// along with their estimated frequency
entries := counter.ItemsAboveThreshold(.9)
fmt.Println(entries)
[{value 1.00009900990099}]

examples showing memory usage comparisons

About

Lossy Counting and Sticky Sampling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages