streaming_algorithms

SIMD-accelerated implementations of various streaming algorithms.

This library is a work in progress. PRs are very welcome! Currently implemented algorithms include:

Count–min sketch
Top k (Count–min sketch plus a doubly linked hashmap to track heavy hitters / top k keys when ordered by aggregated value)
HyperLogLog
Reservoir sampling

A goal of this library is to enable composition of these algorithms; for example Top k + HyperLogLog to enable an approximate version of something akin to SELECT key FROM table GROUP BY key ORDER BY COUNT(DISTINCT value) DESC LIMIT k.

Run your application with RUSTFLAGS="-C target-cpu=native" and the nightly feature to benefit from the SIMD-acceleration like so:

RUSTFLAGS="-C target-cpu=native" cargo run --features "streaming_algorithms/nightly" --release

See this gist for a good list of further algorithms to be implemented. Other resources are Probabilistic data structures – Wikipedia, DataSketches – A similar Java library originating at Yahoo, and Algebird – A similar Java library originating at Twitter.

As these implementations are often in hot code paths, unsafe is used, albeit only when necessary to a) achieve the asymptotically optimal algorithm or b) mitigate an observed bottleneck.

License

Licensed under either of

Apache License, Version 2.0, (LICENSE-APACHE.txt or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT.txt or http://opensource.org/licenses/MIT)

at your option.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.mergify.yml		.mergify.yml
.rustfmt.toml		.rustfmt.toml
Cargo.toml		Cargo.toml
LICENSE-APACHE.txt		LICENSE-APACHE.txt
LICENSE-MIT.txt		LICENSE-MIT.txt
README.md		README.md
azure-pipelines.yml		azure-pipelines.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

streaming_algorithms

License

About

Licenses found

Releases 5

Packages

Contributors 3

Languages

License

Licenses found

alecmocatta/streaming_algorithms

Folders and files

Latest commit

History

Repository files navigation

streaming_algorithms

License

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 3

Languages

Packages