-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use caching allocators in Thrust #835
Conversation
I was very pleasantly surprised to learn that using our vecmem memory resources in Thrust is trivial! You simply create a polymorphic allocator from them and Thrust will accept them as-is. This should reduce the amount of memory allocations significantly throughput traccc. With this change, I observe a 10-15% improvement of the full-application throughput of traccc.
bc4d5e5
to
13cd6af
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also pretty sure that we could avoid creating these policy objects on every single call separately. 🤔 Again, something to try.
I'm very happy though about the performance improvement that you found!
measurements_view.ptr() + n_measurements, | ||
measurement_sort_comp()); | ||
thrust::sort( | ||
thrust::cuda::par(std::pmr::polymorphic_allocator(&(m_mr.main))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange that you needed to be so very explicit. 🤔 std::pmr::polymoprhic_allocator
doesn't put explicit
on its constructor for sure. That's how we can create vecmem::vector
objects without always writing out the full allocator name in their constructors.
I guess I'll try thrust::cuda::par(&(m_mr.main)).on(stream)
tomorrow, to see if that would also work...
Yeah - What a free lunch.. |
I was very pleasantly surprised to learn that using our vecmem memory resources in Thrust is trivial! You simply create a polymorphic allocator from them and Thrust will accept them as-is. This should reduce the amount of memory allocations significantly throughput traccc.
With this change, I observe a 10-15% improvement of the full-application throughput of traccc. 🥳