Insert-Optimized Implementation of Streaming Data Sketches

This repository contains the implementations accompanying our paper "Insert-Optimized Implementation of Streaming Data Sketches."

The codebase includes our optimized implementations alongside baseline versions and Apache DataSketches implementations. You'll also find the microbenchmarking code, results from our experiments, and visualization scripts for analyzing the results.

Note: Due to licensing restrictions, implementations from compared papers are not included in this repository.

Build

To build the project, simply run:

./build.sh

This script uses CMake with GCC in release mode by default, and creates all build artifacts in the cmake-build-release.

Run Benchmarks

After building, execute the following commands to run the benchmarks:

cmake-build-release/bm_insert --benchmark_out="results/bm_insert.json" --benchmark_min_time=10s

cmake-build-release/bm_hash --benchmark_out="results/bm_hash.json" --benchmark_min_time=10s

cmake-build-release/bm_hash_insert --benchmark_out="results/bm_hash_insert.json" --benchmark_min_time=10s

Plot

Execute the Jupyter notebook analysis.ipynb to generate the plots from the paper, which will be saved in the figures/ directory.

Benchmarking Your Own Sketch Implementation

Add your implementation to the src directory
Integrate it into one of the existing benchmarks in bench/
Ensure your implementation uses these sketch parameters:
- Count Sketch: t = 2048, d = 5
- SpaceSaving: k = 96
- Karnin-Lang-Liberty: k = 200
Follow the build, run, and plot instructions above

Citation

Our paper is available here (TODO).

TODO

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bench		bench
cmake		cmake
figures		figures
include		include
results		results
src		src
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.clangd		.clangd
.gitattributes		.gitattributes
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
THIRD-PARTY-LICENSES		THIRD-PARTY-LICENSES
analysis.ipynb		analysis.ipynb
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Insert-Optimized Implementation of Streaming Data Sketches

Build

Run Benchmarks

Plot

Benchmarking Your Own Sketch Implementation

Citation

Security

License

About

Contributors 2

Languages

License

amazon-science/Insert-Optimized-Data-Sketches

Folders and files

Latest commit

History

Repository files navigation

Insert-Optimized Implementation of Streaming Data Sketches

Build

Run Benchmarks

Plot

Benchmarking Your Own Sketch Implementation

Citation

Security

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Contributors 2

Languages