-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPH pixelization routine is slow #2682
Comments
I poked a little bit at it and was able to get a minor speedup at the cost of some additional complexity (in the neighborhood of 20% maybe?) by doing this diff:
but I'm not sure it's worth the additional complexity, and the speedup from parallelization is better anyway. This is one of those things that would be really well-suited to GPU computation. |
Just noting that this method became serial instead of parallel with afc2b0c and it also shows there are other places where parallelization should be restored. |
@AshKelly do you know if the cython issue in question has been fixed? |
Looks like the issue is still open - cython/cython#2316 But I will checkout this repo https://github.com/ngoldbaum/cython-issue and confirm if it has been patched. |
Oh, looks like I didn't know OpenMP 4.5's reduction is a thing (not surprisingly since last time I used OpenMP it was at 2.0 :P) and I think my implementation works around it. |
yeah, local buffers completely avoid it. We toyed with the idea at the time, but Nathan didn't like the extra memory overhead and I think we hoped for a relatively fast fix from the cython side. I think it's complicated fixing from their side since they didn't specify a minimum version of OpenMP / gcc. |
As per Matt's suggestion, it's also viable to parallelize the inner loops over pixels. That makes writes to |
Parallelize pixelize_sph_kernel_projection. Fixes #2682
Bug report
Bug summary
One of the cookbook recipes takes a lot of time on testing box:
I think that the
pixelize_sph_kernel_projection
could be easily parallelized to circumvent this.The text was updated successfully, but these errors were encountered: