My take on the 1 Billion Row Challenge in Rust.
I've made the create_measurements.py
script available as a flake output, so you can generate the test input with:
$ nix run .#create-measurements -- 1_000_000_000 # you can specify smaller inputs, too
Note: the minimum input size for the script is 10_000
.
For details, see the build_test_data
function
You can run my solutions with:
$ nix run . -- -r <runner> ./measurements.txt
# For additional information, run:
$ nix run . -- --help
Much like the official competition, results are taken by running each solution five times, discarding the highest & lowest results. This particular table of solutions was run on a system with an AMD Ryzen 7 2700X processor and 16GiB of memory. Additionally, the 'delta' column represents the percentage change of a particular runner compared to the baseline.
Runner | Runtime | Delta | Notes |
---|---|---|---|
Baseline | 195s 179ms ± 6s 423ms | N/A | Basic implementation; iterate through the file line-by-line |
Chunks | 69s 971ms ± 0s 094ms | -64.1% | Reduce the number of I/O operations by loading the file in large chunks into memory |
- Use Bytehound to do some memory profiling?
- Do some sort of CPU & disk usage profiling
- Get a solution that runs in < 60s
- Get a solution that runs in < 10s