Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelise probabilities #161

Merged
merged 19 commits into from
May 23, 2020
Merged

Parallelise probabilities #161

merged 19 commits into from
May 23, 2020

Conversation

thisac
Copy link
Contributor

@thisac thisac commented Apr 21, 2020

Context:
The probabilities function could be easily parallelised, potentially providing nice speed-ups.

Description of the Change:
Adds a parallelisation option to the probabilities function so that the probabilities can be calculated simultaneously. Tests are also adapted to include running the probabilities function with parallel=True and parallel=False.

Benefits:
The probability-calculations can be parallelised and thus possibly done much quicker.

Possible Drawbacks:
Since the OpenMP is already utilising parallelisation in the background, the implementations in this PR does not necessarily provide any speedups, but rather gives the option to parallelise differently.

Related GitHub Issues:
N\A

@thisac thisac requested a review from nquesada April 21, 2020 17:23
@thisac thisac self-assigned this Apr 21, 2020
Comment on lines 1098 to 1100
probs = np.maximum(
0.0, np.real_if_close(dask.compute(*compute_list, scheduler="threads"))
).reshape([cutoff] * num_modes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be curious if this is faster in practice! partly because density_matrix_element is already parallelized, right? iirc it uses OpenMP to compute the hafnian corresponding to a single matrix element, so there is almost a double-parallelization happening

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be correct. I noticed that it didn't seem to provide any speedups (but only after I pushed it to this PR), but I haven't really benchmarked it yet.

@josh146
Copy link
Member

josh146 commented Apr 22, 2020

@thisac, @nquesada: It looks like the build is failing because llvmlite no longer provides wheels for Python 3.5 on the latest version.

You could try pinning llvmlite+numba to the previous version that supported Python 3.5, however it probably makes sense to deprecate Python 3.5 support.

Do you want to make another PR that officially removes Python 3.5 support? This would involve updating the setup.py, the readme, installation instructions, and travis/circle/appveyor configs.

@nquesada
Copy link
Collaborator

I think we should stop supporting python 3.5 as @josh146 suggests.

Copy link
Member

@josh146 josh146 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my end, will have to wait on a PR that removes Python 3.5 support though.

@codecov
Copy link

codecov bot commented Apr 24, 2020

Codecov Report

Merging #161 into master will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #161   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           14        14           
  Lines         1002      1080   +78     
=========================================
+ Hits          1002      1080   +78     
Impacted Files Coverage Δ
thewalrus/quantum.py 100.00% <100.00%> (ø)
thewalrus/samples.py 100.00% <0.00%> (ø)
thewalrus/csamples.py 100.00% <0.00%> (ø)
thewalrus/symplectic.py 100.00% <0.00%> (ø)
thewalrus/fock_gradients.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf7a19e...8a519ad. Read the comment docs.

@nquesada
Copy link
Collaborator

Hey @thisac : Could you add some screen shots of the improvements you found when tweaking OMP_NUM_THREADS here? Also maybe it is worth mentioning that in the docstring?
For example explain to the user when it is worth allowing parallel=True and how to do it by exporting environment variables. Other than that I think this is ready to be merged.

@thisac
Copy link
Contributor Author

thisac commented Apr 30, 2020

Output from some simple benchmarks

Below are the results from some simple benchmarks comparing how the use of parallelisation can speed things up (or slow things down).

OpenMP uses parallelisation, which can be turned of by setting the environment variable OMP_NUM_THREADS=1, meaning that only a single thread will be used. By default this is set to use all threads (8 in this case).

Dask can use either the "threads" (uses multiple threads in the same process) or the "processes" (sends data to separate processes) scheduler. "threads" is bound by the GIL and is thus best to use with non-python objects while "processes" works best with pure python code (with a slight overhead). See https://docs.dask.org/en/latest/setup/single-machine.html for more information.

As seen below, the fastest run seems to be when just using the Dask parallelisation in probabilities while using the "processes" scheduler and while turning off OpenMP parallelisation.

from thewalrus.quantum import probabilities as p
import numpy as np

n = 4
mu = np.random.random(2*n)
cov = np.random.random((2*n, 2*n))
cov += cov.conj().T

With OpenMP parallelisation and with "threads" sheduler in Dask

print("\nNo parallel excecution with Dask")
%timeit p(mu, cov, cutoff=4, parallel=False)

print("\nWith parallel excecution with Dask")
%timeit p(mu, cov, cutoff=4, parallel=True)
No parallel excecution with Dask
419 ms ± 74.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With parallel excecution with Dask
2.72 s ± 58.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Without OpenMP parallelisation and with "threads" sheduler in Dask

%set_env OMP_NUM_THREADS=1

print("\nNo parallel excecution with Dask")
%timeit p(mu, cov, cutoff=4, parallel=False)

print("\nWith parallel excecution with Dask")
%timeit p(mu, cov, cutoff=4, parallel=True)
env: OMP_NUM_THREADS=1

No parallel excecution with Dask
632 ms ± 28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With parallel excecution with Dask
748 ms ± 2.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Without OpenMP parallelisation and with "processes" sheduler in Dask

%set_env OMP_NUM_THREADS=1

print("\nNo parallel excecution with Dask")
%timeit p(mu, cov, cutoff=4, parallel=False)

print("\nWith parallel excecution with Dask")
%timeit p(mu, cov, cutoff=4, parallel=True)
env: OMP_NUM_THREADS=1

No parallel excecution with Dask
605 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With parallel excecution with Dask
302 ms ± 6.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@josh146
Copy link
Member

josh146 commented May 1, 2020

That's great @thisac! It might also be interesting to see how this scales as the number of modes/cutoff increases.

@thisac thisac requested a review from josh146 May 20, 2020 22:54
-0.25 * deltar @ si12 @ deltar
)
return f
# Copyright 2019 Xanadu Quantum Technologies Inc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thisac, for some reason GitHub is showing this entire file as changed? It's difficult to tell what needs to be code reviewed here.

Has the file mode changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's due to deleting whitespaces on every line. You can remove it from the GitHub diff in the settings (or by appending ?w=1 to the GitHub url).

Comment on lines 1116 to 1120
Parallization is already being done by OpenMP when calling ``density_matrix_element``.
To get a speed-up from using ``parallel=True`` it must be turned off by setting the
environment variable ``OMP_NUM_THREADS=1`` (forcing single threaded use). Remove the
environment variable or set it to ``OMP_NUM_THREADS=''`` to again use multithreading
with OpenMP.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempted to make it slightly clearer, let me know what you think!

Suggested change
Parallization is already being done by OpenMP when calling ``density_matrix_element``.
To get a speed-up from using ``parallel=True`` it must be turned off by setting the
environment variable ``OMP_NUM_THREADS=1`` (forcing single threaded use). Remove the
environment variable or set it to ``OMP_NUM_THREADS=''`` to again use multithreading
with OpenMP.
Individual density matrix elements are computed using multithreading by OpenMP.
Setting ``parallel=True`` will further result in *multiple* density matrix elements
being computed in parallel.
When setting ``parallel=True``, OpenMP will be turned off by setting the
environment variable ``OMP_NUM_THREADS=1`` (forcing single threaded use for individual
matrix elements). Remove the environment variable or set it to ``OMP_NUM_THREADS=''``
to again use multithreading with OpenMP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Josh. It looks good. 👍 Just made some small changes to the last line.

Comment on lines 1153 to 1157
# restore env variable to value before (or remove if it wasn't set)
if OMP_NUM_THREADS:
os.environ["OMP_NUM_THREADS"] = OMP_NUM_THREADS
else:
del os.environ["OMP_NUM_THREADS"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I only have one (minor) concern --- this is modifying a global environment variable.

If the user is running multiple Python scripts at the same time, or maybe even any other program, then this code-block will cause side effects in the other running processes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this could be an issue. I would say that this is OK if conveyed clearly to the user (e.g. by mentioning it in the docstring). Another option would be to simply avoid changing the environment variable in the function itself and only do it during testing. Then it would be up to the user to switch off OpenMP parallelisation before running with parallel=True, although this could still pose a similar issue since programs might freeze/crash if not taking this into account.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about it, it definitely feels a bit strange for a Python library to be changing a users environment variables. If the calculation crashes midway, for instance, the environment variables will never be reset to the original values.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think it makes sense to do it in the tests, but otherwise you tell the uer how to do it in the docstring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards this being something we communicate clearly in the documentation, but don't actually modify ourselves.

Unless, this is a common occurrence? Are there examples of other python libraries modifying OMP_NUM_THREADS?

Copy link
Contributor Author

@thisac thisac May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's a bit strange, and perhaps a quite bad thing to do. I'll remove it from the function then and simply change the environment variable in the tests (would this still be OK?), and also updating the docstring.

I don't know of any other places modifying OMP_NUM_THREADS. 🤔

@thisac
Copy link
Contributor Author

thisac commented May 22, 2020

I've moved the environment variable changes to the tests (so no changes are being made in the probabilities function itself), utilising monkeypatch.setenv (thanks @josh146 🥇).

I've also checked locally that OMP_NUM_THREADS=1 whenever parallel=True during the testing. The docstring is also slightly altered.

@@ -1023,6 +1038,36 @@ def test_update_with_noise_coherent_value_error():
update_probabilities_with_noise(noise_dists, probs)


# @pytest.mark.parametrize("test_env_var", [None, "1", "2"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you planning on leaving this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no, good catch. Should've removed it. 🤦

@nquesada
Copy link
Collaborator

One can set up the number of OMP threads by doing os.environ["OMP_NUM_THREADS"] = "1"

Copy link
Member

@josh146 josh146 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a better approach @thisac 💯

Is this ready to be merged? I might have to override CodeCov, looks like it never reported the coverage to GitHub

Comment on lines +951 to +952
if parallel: # set single-thread use in OpenMP
monkeypatch.setenv("OMP_NUM_THREADS", "1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@josh146 josh146 merged commit 830f74c into master May 23, 2020
@josh146 josh146 deleted the parallelise_probabilities branch May 23, 2020 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants