Hafnian repeated omp #120

trevor-vincent · 2020-01-13T04:24:53Z

Context:

The hafnian repeated moment algorithm was missing OpenMP support.

Description of the Change:

Add OpenMP support to repeated moment hafnian algorithm.

Benefits:

The code will be sped up proportional to the number of cores available.

Possible Drawbacks:

None

Related GitHub Issues:

#46

codecov · 2020-01-13T19:41:12Z

Codecov Report

Merging #120 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #120      +/-   ##
==========================================
- Coverage   97.75%   97.72%   -0.03%     
==========================================
  Files          12       12              
  Lines         891      881      -10     
==========================================
- Hits          871      861      -10     
  Misses         20       20

Impacted Files	Coverage Δ
thewalrus/fock_gradients.py	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba11956...f5c7214. Read the comment docs.

josh146 · 2020-01-13T20:34:15Z

Thanks @trevor-vincent! It looks like the windows build is failing with the following error:

  include\repeated_hafnian.hpp(125): error C3861: 'omp_get_num_threads': identifier not found
  include\repeated_hafnian.hpp(126): error C3861: 'omp_get_thread_num': identifier not found

I don't think I ever got OpenMP compiling with windows/MSVCC, so I had disabled it in the setup.py.
Unless you know how to enable OpenMP on windows, probably sufficient to have the compiler check if it is using MSVCC, and if so, not use those functions.

include/repeated_hafnian.hpp

trevor-vincent · 2020-01-13T20:36:57Z

Okay thanks Josh. Let me see if I can put some preprocessor code to get around this issue for the windows build.

josh146 · 2020-01-13T21:06:40Z

It seems that the _OPENMP flag is defined on Windows for some reason 🤔 Another option is to use _MSC_VER (which should be defined)

nquesada · 2020-01-13T22:10:37Z

@trevor-vincent Could you share with us some benchmarking of the implementation in this PR vs the old one? And also some basic idea of how embarrassing is the parallelization, i.e. run a non negligible calculation with OMP_NUM_THREAD = 1,2,4 ?

trevor-vincent · 2020-01-13T23:04:52Z

@trevor-vincent Could you share with us some benchmarking of the implementation in this PR vs the old one? And also some basic idea of how embarrassing is the parallelization, i.e. run a non negligible calculation with OMP_NUM_THREAD = 1,2,4 ?

It is pretty much embarrassingly parallel, here is some results for n = 30 loop hafnian on a 4-core machine. The speedup would probably get even better as n goes to infinity

loop hafnian / n = 30 / rpt = 11111... / mat = random
cores ----- time(s) ----- speedup
1 ----- 322.51 ----- 1.00
2 ----- 161.42 ----- 1.99
3 ----- 112.95 ----- 2.85
4 ----- 86.32 ----- 3.74

Edit: 4 cores will always be a bit lower than a 4x speedup because the machine needs one core to deal with the operating system. 3 cores is not as good as 2 because the number of sum terms isn't divisible by three so the inefficiency might be due to the uneven division of the sum between cores. Other than that, this is quite good scaling.

trevor-vincent · 2020-01-14T15:55:02Z

Old non-parallel version runs in a time of 296.973s. Therefore the new version has an 8.7% slow down for n = 30 on 1 core versus old version, but this percentage should decrease as n gets larger because the only major changes are outside of the main O(n*2^n) work loop. This very small slow down is due to the necessary overhead in computing which sections of the sum each of the cores will deal with.

trevor-vincent · 2020-01-16T04:32:45Z

I can actually (I think) remove the overhead in the n=1 case, so I'll do that tomorrow.

trevor-vincent · 2020-01-16T22:09:10Z

newer version on one core is now slightly faster than the older version on one core, e.g. running it now on 1 core for n = 30 I get:
new version loop hafnian time = 297.621
old version loop hafnian time = 303.325

josh146

Thanks @trevor-vincent this looks great! I left very minor comments regarding some comments and docstrings, but didn't go through the new code logic.

Small question: is there a significant performance increase for the single threaded version when n < 30? Or is it negligible?

include/repeated_hafnian.hpp

trevor-vincent · 2020-01-17T16:11:59Z

Okay great thanks Josh, I'll try to fix these things. Regarding whether the speedup is negligible for n < 30, I'll try to make a plot and hopefully this will help clarify things. From recollection the speedup is pretty meaningful for n>24. For n<24 the hafnian is computed so fast that the extra speedup isn't really needed. However there is actually still a speedup for 4 cores at even n=14, but I doubt it is close to 4x. Probably at some n there will be no speedup because the overhead for the extra cores is too much. All of this of course depends on the rpt vector which can increase the workload at any n. The greater its values the more work there will be and the more useful the speedup is. For the tests I've been setting rpt=111111111..., so it doesn't affect things much and the workload is more dependent on n.

trevor-vincent · 2020-01-17T20:35:45Z

speedup over different n's (rpt = 11111.....)
not sure what the cores=4 randomness at small n is due to atm, but these times are really small so timings are probably very inaccurate here anyway

trevor-vincent · 2020-01-17T20:52:35Z

1 core times for reference

nquesada · 2020-01-17T21:22:32Z

Look like it is ready to be merged!

josh146

Thanks Trevor! Looks good from my end, will just wait to see if @nquesada has any comments

trevor-vincent and others added 4 commits January 12, 2020 23:14

Add openmp parallelization to repeated hafnian

24d3400

Fix compiler warnings

350083d

Fix bug in lin_to_multi

6fa8f44

update changelog

a8cce49

trevor-vincent requested review from josh146 and nquesada January 13, 2020 20:06

josh146 reviewed Jan 13, 2020

View reviewed changes

include/repeated_hafnian.hpp Outdated Show resolved Hide resolved

Trevor Vincent added 2 commits January 13, 2020 15:57

Fix windows issue

9055570

Revert clang-format doxygen changes

1deab6d

Fix windows problem attempt #2

d18c141

Remove lines that are not needed anymore

1fa6d72

josh146 added C++ review-ready labels Jan 16, 2020

Add minor optimizations for one core

d99d0f8

josh146 requested changes Jan 17, 2020

View reviewed changes

Add josh suggestions

ba11956

Merge branch 'master' into hafnian_repeated_omp

f5c7214

josh146 approved these changes Jan 17, 2020

View reviewed changes

nquesada merged commit f227351 into master Jan 17, 2020

nquesada mentioned this pull request Jan 18, 2020

OpenMP for the repeated-moment algorithm #46

Closed

nquesada deleted the hafnian_repeated_omp branch January 21, 2020 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hafnian repeated omp #120

Hafnian repeated omp #120

trevor-vincent commented Jan 13, 2020 •

edited

Loading

codecov bot commented Jan 13, 2020 •

edited

Loading

josh146 commented Jan 13, 2020 •

edited

Loading

trevor-vincent commented Jan 13, 2020

josh146 commented Jan 13, 2020

nquesada commented Jan 13, 2020

trevor-vincent commented Jan 13, 2020 •

edited

Loading

trevor-vincent commented Jan 14, 2020 •

edited

Loading

trevor-vincent commented Jan 16, 2020 •

edited

Loading

trevor-vincent commented Jan 16, 2020

josh146 left a comment

trevor-vincent commented Jan 17, 2020

trevor-vincent commented Jan 17, 2020 •

edited

Loading

trevor-vincent commented Jan 17, 2020

nquesada commented Jan 17, 2020

josh146 left a comment

Hafnian repeated omp #120

Hafnian repeated omp #120

Conversation

trevor-vincent commented Jan 13, 2020 • edited Loading

codecov bot commented Jan 13, 2020 • edited Loading

Codecov Report

josh146 commented Jan 13, 2020 • edited Loading

trevor-vincent commented Jan 13, 2020

josh146 commented Jan 13, 2020

nquesada commented Jan 13, 2020

trevor-vincent commented Jan 13, 2020 • edited Loading

trevor-vincent commented Jan 14, 2020 • edited Loading

trevor-vincent commented Jan 16, 2020 • edited Loading

trevor-vincent commented Jan 16, 2020

josh146 left a comment

Choose a reason for hiding this comment

trevor-vincent commented Jan 17, 2020

trevor-vincent commented Jan 17, 2020 • edited Loading

trevor-vincent commented Jan 17, 2020

nquesada commented Jan 17, 2020

josh146 left a comment

Choose a reason for hiding this comment

trevor-vincent commented Jan 13, 2020 •

edited

Loading

codecov bot commented Jan 13, 2020 •

edited

Loading

josh146 commented Jan 13, 2020 •

edited

Loading

trevor-vincent commented Jan 13, 2020 •

edited

Loading

trevor-vincent commented Jan 14, 2020 •

edited

Loading

trevor-vincent commented Jan 16, 2020 •

edited

Loading

trevor-vincent commented Jan 17, 2020 •

edited

Loading