-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize pixelize_sph_kernel_projection. Fixes #2682 #2683
Parallelize pixelize_sph_kernel_projection. Fixes #2682 #2683
Conversation
e0146b6
to
55af94c
Compare
This is awesome! |
55af94c
to
2febb94
Compare
After switching to an inner loop parallelization:
|
2febb94
to
6c2a48f
Compare
6c2a48f
to
31397fd
Compare
I think it's worth leaving a comment in the code mentioning that there are two different regimes where we could apply parallelization. This was optimized for the regime where we have a not-very-large-number of reasonably large-compared-to-pixels particles. So in these, we have a good number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look good to me. I'm on OSX so I can't verify locally.
You mentioned in the PR that one of these changes is the inner loop parallelization? Or did that get overwritten in a push?
All three methods are now using parallelization on the inner loop. |
cdb48f2
to
7a1f12b
Compare
buff[xi, yi] += prefactor_j * kernel_func(q_ij) | ||
local_buff[xi + yi*xsize] += prefactor_j * kernel_func(q_ij) | ||
|
||
with gil: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as this is the recommended way to do the reduce, i think it's okay, but i had to check indentation levels to make sure I understood what was happening and which scope this was in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it's the recommended, but that was the only one that worked :)
2909cef
to
d19846d
Compare
c88cc4c
to
3e21909
Compare
@yt-fido test this please |
@Xarthisius I'd like this to go in, but the conflicts I see are not immediately obvious to me. Can you take another shot? |
3e21909
to
50adcbc
Compare
f220db8
to
f323b0a
Compare
Co-authored-by: Clément Robert <cr52@protonmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. My expertise in Cython is still extremely shallow but I think I can approve of the changes presented here.
@@ -1447,6 +1500,7 @@ def pixelize_sph_kernel_arbitrary_grid(np.float64_t[:, :, :] buff, | |||
kernel_func = get_kernel_func(kernel_name) | |||
|
|||
with nogil: | |||
# TODO make this parallel without using too much memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, is this a leftover that you forgot to address or are you planning to address it in the future ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to address it in the future, provided that I'll figure out how to do that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, just checking !
PR Summary
This PR parallelizes sph kernel projection using OpenMP. On my laptop the speed up is modarate (<0.5 for 4 cores), but it yields a significant improvement on testing infra: