My take on the 1 Billion Row Challenge in Rust.
I've made the create_measurements.py
script available as a flake output, so you can generate the test input with:
$ nix run .#create-measurements -- 1_000_000_000 # you can specify smaller inputs, too
Note: the minimum input size for the script is 10_000
.
For details, see the build_test_data
function
You detailed instructions on how to run these solutions, see the help text with:
$ nix run . -- --help
# Or, if you don't have Nix:
$ cargo run -- --help
Much like the official competition, results are taken by running each solution five times, discarding the highest & lowest results. This particular table of solutions was run on a system with an AMD Ryzen 7 2700X processor and 16GiB of memory. Additionally, the 'delta' column represents the percentage change of a particular runner compared to the baseline.
Runner | Runtime | Delta | Notes |
---|---|---|---|
Baseline | 178s 985ms ± 0s 041ms | N/A | Basic implementation; iterate through the file line-by-line |
BigBuf | 187s 849ms ± 0s 186ms | +4.95% | Use a larger BufReader buffer size |
- Use Bytehound to do some memory profiling?
- Do some sort of CPU & disk usage profiling
- Get a solution that runs in < 60s
- Get a solution that runs in < 10s