-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hafnian repeated omp #120
Hafnian repeated omp #120
Conversation
Codecov Report
@@ Coverage Diff @@
## master #120 +/- ##
==========================================
- Coverage 97.75% 97.72% -0.03%
==========================================
Files 12 12
Lines 891 881 -10
==========================================
- Hits 871 861 -10
Misses 20 20
Continue to review full report at Codecov.
|
Thanks @trevor-vincent! It looks like the windows build is failing with the following error:
I don't think I ever got OpenMP compiling with windows/MSVCC, so I had disabled it in the |
Okay thanks Josh. Let me see if I can put some preprocessor code to get around this issue for the windows build. |
It seems that the |
@trevor-vincent Could you share with us some benchmarking of the implementation in this PR vs the old one? And also some basic idea of how embarrassing is the parallelization, i.e. run a non negligible calculation with OMP_NUM_THREAD = 1,2,4 ? |
It is pretty much embarrassingly parallel, here is some results for n = 30 loop hafnian on a 4-core machine. The speedup would probably get even better as n goes to infinity loop hafnian / n = 30 / rpt = 11111... / mat = random Edit: 4 cores will always be a bit lower than a 4x speedup because the machine needs one core to deal with the operating system. 3 cores is not as good as 2 because the number of sum terms isn't divisible by three so the inefficiency might be due to the uneven division of the sum between cores. Other than that, this is quite good scaling. |
Old non-parallel version runs in a time of 296.973s. Therefore the new version has an 8.7% slow down for n = 30 on 1 core versus old version, but this percentage should decrease as n gets larger because the only major changes are outside of the main O(n*2^n) work loop. This very small slow down is due to the necessary overhead in computing which sections of the sum each of the cores will deal with. |
I can actually (I think) remove the overhead in the n=1 case, so I'll do that tomorrow. |
newer version on one core is now slightly faster than the older version on one core, e.g. running it now on 1 core for n = 30 I get: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @trevor-vincent this looks great! I left very minor comments regarding some comments and docstrings, but didn't go through the new code logic.
Small question: is there a significant performance increase for the single threaded version when n < 30? Or is it negligible?
Okay great thanks Josh, I'll try to fix these things. Regarding whether the speedup is negligible for n < 30, I'll try to make a plot and hopefully this will help clarify things. From recollection the speedup is pretty meaningful for n>24. For n<24 the hafnian is computed so fast that the extra speedup isn't really needed. However there is actually still a speedup for 4 cores at even n=14, but I doubt it is close to 4x. Probably at some n there will be no speedup because the overhead for the extra cores is too much. All of this of course depends on the rpt vector which can increase the workload at any n. The greater its values the more work there will be and the more useful the speedup is. For the tests I've been setting rpt=111111111..., so it doesn't affect things much and the workload is more dependent on n. |
Look like it is ready to be merged! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Trevor! Looks good from my end, will just wait to see if @nquesada has any comments
Context:
The hafnian repeated moment algorithm was missing OpenMP support.
Description of the Change:
Add OpenMP support to repeated moment hafnian algorithm.
Benefits:
The code will be sped up proportional to the number of cores available.
Possible Drawbacks:
None
Related GitHub Issues:
#46