Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize pixelize_sph_kernel_projection. Fixes #2682 #2683

Merged

Conversation

Xarthisius
Copy link
Member

PR Summary

This PR parallelizes sph kernel projection using OpenMP. On my laptop the speed up is modarate (<0.5 for 4 cores), but it yields a significant improvement on testing infra:

$ time python doc/source/cookbook/image_resolution.py   # without this patch

real	16m6.577s
user	16m5.294s
sys	0m4.467s

$ time python doc/source/cookbook/image_resolution.py # with this patch, all cores (40)
real	3m11.529s
user	23m2.800s
sys	0m14.970s

$ export OMP_NUM_THREADS=8  # the default setting we use during a test suite build
$ time python doc/source/cookbook/image_resolution.py 

real	4m9.637s
user	14m26.263s
sys	0m2.226s

@Xarthisius Xarthisius force-pushed the 2682_parallel_sph_pixelization branch 3 times, most recently from e0146b6 to 55af94c Compare June 24, 2020 19:41
@chummels
Copy link
Member

This is awesome!

@Xarthisius Xarthisius force-pushed the 2682_parallel_sph_pixelization branch from 55af94c to 2febb94 Compare June 24, 2020 19:41
@Xarthisius
Copy link
Member Author

Xarthisius commented Jun 24, 2020

After switching to an inner loop parallelization:

(blah) fido@c1af1189021c /tmp/yt-4 $ export OMP_NUM_THREADS=8
(blah) fido@c1af1189021c /tmp/yt-4 $ time python doc/source/cookbook/image_resolution.py 

real	2m38.511s
user	17m34.109s
sys	0m2.471s

@Xarthisius Xarthisius force-pushed the 2682_parallel_sph_pixelization branch from 2febb94 to 6c2a48f Compare June 24, 2020 19:56
@munkm munkm added enhancement Making something better parallelism MPI-based parallelism yt core Core components and algorithms in yt labels Jun 24, 2020
@Xarthisius Xarthisius force-pushed the 2682_parallel_sph_pixelization branch from 6c2a48f to 31397fd Compare June 25, 2020 13:50
@matthewturk
Copy link
Member

I think it's worth leaving a comment in the code mentioning that there are two different regimes where we could apply parallelization. This was optimized for the regime where we have a not-very-large-number of reasonably large-compared-to-pixels particles. So in these, we have a good number of yi values that each one can take on during an xi iteration, so there's "work to be done" in that loop. In the case of lots of itty bitty particles, this wouldn't be quite as efficient.

Copy link
Member

@munkm munkm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look good to me. I'm on OSX so I can't verify locally.

You mentioned in the PR that one of these changes is the inner loop parallelization? Or did that get overwritten in a push?

@Xarthisius
Copy link
Member Author

You mentioned in the PR that one of these changes is the inner loop parallelization? Or did that get overwritten in a push?

All three methods are now using parallelization on the inner loop.

@Xarthisius Xarthisius force-pushed the 2682_parallel_sph_pixelization branch from cdb48f2 to 7a1f12b Compare June 26, 2020 17:52
yt/utilities/lib/pixelization_routines.pyx Show resolved Hide resolved
buff[xi, yi] += prefactor_j * kernel_func(q_ij)
local_buff[xi + yi*xsize] += prefactor_j * kernel_func(q_ij)

with gil:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as this is the recommended way to do the reduce, i think it's okay, but i had to check indentation levels to make sure I understood what was happening and which scope this was in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's the recommended, but that was the only one that worked :)

@Xarthisius Xarthisius force-pushed the 2682_parallel_sph_pixelization branch from 2909cef to d19846d Compare June 30, 2020 17:34
@Xarthisius Xarthisius force-pushed the 2682_parallel_sph_pixelization branch 3 times, most recently from c88cc4c to 3e21909 Compare July 23, 2020 00:53
@Xarthisius
Copy link
Member Author

@yt-fido test this please

@matthewturk
Copy link
Member

@Xarthisius I'd like this to go in, but the conflicts I see are not immediately obvious to me. Can you take another shot?

@Xarthisius Xarthisius force-pushed the 2682_parallel_sph_pixelization branch from f220db8 to f323b0a Compare September 24, 2020 14:53
Co-authored-by: Clément Robert <cr52@protonmail.com>
Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. My expertise in Cython is still extremely shallow but I think I can approve of the changes presented here.

@@ -1447,6 +1500,7 @@ def pixelize_sph_kernel_arbitrary_grid(np.float64_t[:, :, :] buff,
kernel_func = get_kernel_func(kernel_name)

with nogil:
# TODO make this parallel without using too much memory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, is this a leftover that you forgot to address or are you planning to address it in the future ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to address it in the future, provided that I'll figure out how to do that...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, just checking !

@neutrinoceros neutrinoceros merged commit 4d9ab24 into yt-project:master Sep 24, 2020
@Xarthisius Xarthisius deleted the 2682_parallel_sph_pixelization branch September 24, 2020 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Making something better parallelism MPI-based parallelism yt core Core components and algorithms in yt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants