Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare performance to h3ron-ndarray #1

Closed
nmandery opened this issue Jan 10, 2023 · 12 comments
Closed

Compare performance to h3ron-ndarray #1

nmandery opened this issue Jan 10, 2023 · 12 comments
Assignees

Comments

@nmandery
Copy link
Owner

h3o shows quite a few performance improvements compared to libh3. As this crate is at this stage mostly a 1:1 port of h3ron-ndarray to h3o, with only a few - mostly insignificant - changes, this a good occasion to compare the H3 C library to h3o.

@nmandery nmandery self-assigned this Jan 10, 2023
@nmandery
Copy link
Owner Author

Benchmark of h3ron-ndarray - using the H3 C implementation (best run out of five):

     Running benches/convert_dataset_r.rs (/home/nicodev/CLionProjects/h3ron/target/release/deps/convert_dataset_r-800d483b556ec67a)
Gnuplot not found, using plotters backend
Benchmarking raster conversion/convert_r_dataset_h3_res_11: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 5.5s.
raster conversion/convert_r_dataset_h3_res_11
                        time:   [476.78 ms 483.89 ms 492.18 ms]
                        change: [-4.2970% -2.0804% +0.0217%] (p = 0.10 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benchmark of this crate - using the h3o implementation (best run out of five):

     Running benches/convert_dataset_r.rs (target/release/deps/convert_dataset_r-995ad78f597f52ae)
Gnuplot not found, using plotters backend
raster conversion/convert_r_dataset_h3_res_11
                        time:   [288.21 ms 293.16 ms 298.43 ms]
                        change: [-9.2978% -5.6400% -2.4256%] (p = 0.01 < 0.05)
                        Performance has improved.

The generated output files are identical.

@grim7reaper: That is a very impressive improvement. Thanks for putting all the effort into h3o.

@grim7reaper
Copy link

Ha, so cool to see such speedup in a real use case that uses a mix of h3o calls!
(unlike benchmarks that are always a bit artificial and focused on a single function).

Was the migration from h3ron smooth?

@nmandery
Copy link
Owner Author

nmandery commented Jan 11, 2023

Migration was pretty smooth, it is nice that less operations are failable in the API than in the H3 API itself.

I could have used a Into<geo_types::Coord<f64>> for LatLng implementation. I can submit PR for this in case you see fit.

@grim7reaper
Copy link

I'm glad to heard that.

For the conversion from Coord, there is already code for that here so it should work I think (though it use radians instead of degrees)

@nmandery
Copy link
Owner Author

Nice - I will use that then. It may be a bit confusing to users as - I would guess - most will expect degrees in geo-types types. Thanks.

@grim7reaper
Copy link

If most user expect degrees, I can change these public impl to work with degrees (and then use private methods for my own conversions in the internals of h3o).
Would probably be better from an user POV.

@nmandery
Copy link
Owner Author

Sounds like a good solution 👍

@nmandery
Copy link
Owner Author

After updating h3o v0.3.3 -> v0.3.5:

     Running benches/convert_dataset_r.rs (target/release/deps/convert_dataset_r-a0b70d132cef0920)
Gnuplot not found, using plotters backend
raster conversion/convert_r_dataset_h3_res_11
                        time:   [195.34 ms 197.21 ms 199.35 ms]
                        change: [-36.185% -35.028% -33.810%] (p = 0.00 < 0.05)
                        Performance has improved.

v0.3.5 contains performance improvements of the geometry to cells functionality.

@grim7reaper
Copy link

Good to see the benefits propagate to the upper levels 🙂

Normally the bigger the number of cells (big geom and/or small cells), the bigger the speedup since the new algo is kinda O(#outline cells) vs O(#total cells) (I take shortcuts here) it scales way better than before. Also in term of memory usage (which can't be seen in criterion bench).

@nmandery
Copy link
Owner Author

I am tiling the raster array and handling each tile separately in a thread pool - I suppose these smaller tiles somewhat limits the possible speedup here. Maybe I should review my implementation here sometime due to the performance improvements you achieved.

@grim7reaper
Copy link

Yeah, there is probably a sweet spot in term of number of tiles (to benefit from parallelism) and the size of the tile (to have enough work/tile), with the new algo you can probably split less/go for larger tiles.

@nmandery
Copy link
Owner Author

I experimented a bit with the tile size and found increasing the size leads to worse results.

This has its roots in the test image having lots of no-data values - these are pixels which are ignored in the conversion process. The tiling scheme discards tiles only consisting of such pixels, before h3o even gets applied. So in the end an increased tile size leads to more work for the h3o library.

As this library was originally build for earth observation data processed by machine learning models, so data often having large amounts of no-data pixels, I think I will leave the tiling scheme as it is for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants