This repository contains the implementations accompanying our paper "Insert-Optimized Implementation of Streaming Data Sketches."
The codebase includes our optimized implementations alongside baseline versions and Apache DataSketches implementations. You'll also find the microbenchmarking code, results from our experiments, and visualization scripts for analyzing the results.
Note: Due to licensing restrictions, implementations from compared papers are not included in this repository.
To build the project, simply run:
./build.sh
This script uses CMake with GCC in release mode by default, and creates all build artifacts in the cmake-build-release
.
After building, execute the following commands to run the benchmarks:
cmake-build-release/bm_insert --benchmark_out="results/bm_insert.json" --benchmark_min_time=10s
cmake-build-release/bm_hash --benchmark_out="results/bm_hash.json" --benchmark_min_time=10s
cmake-build-release/bm_hash_insert --benchmark_out="results/bm_hash_insert.json" --benchmark_min_time=10s
Execute the Jupyter notebook analysis.ipynb
to generate the plots from the paper, which will be saved in the figures/
directory.
- Add your implementation to the
src
directory - Integrate it into one of the existing benchmarks in
bench/
- Ensure your implementation uses these sketch parameters:
- Count Sketch:
t = 2048
,d = 5
- SpaceSaving:
k = 96
- Karnin-Lang-Liberty:
k = 200
- Count Sketch:
- Follow the build, run, and plot instructions above
Our paper is available here (TODO).
TODO
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.