-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of quadtree point-to-polyline join #362
Improve performance of quadtree point-to-polyline join #362
Conversation
… of quadtree_point_in_polygon to get a 5x perf boost
Mind iterating on this one's passing status when you get the chance @trxcllnt ? |
@thomcom yeah for sure. edit: The new implementation that doesn't use as much intermediate memory done in e79caa9 🎉 |
…quadtree-point-to-nearest-polyline-speed-boost
…mpatible with reduce_by_key
@harrism I re-requested your review for the new version of |
…quadtree-point-to-nearest-polyline-speed-boost
rerun tests |
I totally missed this. I'm sorry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Performance looks great. Several comments, but only one absolutely required change (missing stream sync).
…quadtree-point-to-nearest-polyline-speed-boost
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple more fixes.
Co-authored-by: Mark Harris <mharris@nvidia.com>
@gpucibot merge |
I had questions regarding how the tests were updated due to the new sortedness of the outputs, and @trxcllnt answered those offline. basically, user's weren't guaranteed any sortedness prior to this change, and now we are sorting the output ascending, so no breaking change. lgtm. |
Rewrites
quadtree_point_to_nearest_polyline
via thrust in the style ofquadtree_point_in_polygon
to get a 5x speed boost.Benchmarked locally with NYC taxi 169M float32 points (RMM pool mode enabled):
Timings for every routine in the benchmark after these changes: