Rewrite parallel sampling using multiprocessing #3011

aseyboldt · 2018-06-07T14:12:09Z

This PR gets rid of joblib altogether, and replaces it with a custom implementation.
This should solve some long standing problems:

Pickling models in not always possible, and with this approach we only need to do this on windows now.
Problems pickling large traces. Right now we send the finished trace to the main thread by pickling it and sending it though a pipe. This causes problems if the trace is very large, as on many systems there is a limit for the size of one pipe.send. Very large traces can't use multiprocessing right now.
joblib doesn't seem to be particularly reliable. I've seen lots of cases where processes got stuck, and I had to kill them manually, or interrupting sampling works only after sending several interrupts. (One caviat with the proposed method is that we can only interrupt traces between samples. So very slow samplers might be stuck for a few seconds. We can still kill them though.)
More control over how exceptions are reported. We can format the remote exceptions nicely. This is not implemented yet, but there is code for this here: https://hg.python.org/cpython/rev/c4f92b597074
Better progress bars, as the main processes always knows how far the individual processes are. (but there seem to be tqdm bugs that still mess that up...)
We can implement trace backends so that the main process writes all samples to the file as they come in, and we never need to keep many of them in memory, and can avoid parallel io.

We start one process per chain, that communicates with the main thread by sending messages through a pipe. Ones a draw is ready, it is stored in shared memory, and the main process can access it. Then the sampler process is asked to write the next sample to that shared memory.

Performance wise it shouldn't be much different, the pipes seem to be fast enough to deal with the speed of our sampler. It is possible that we gain a bit of speed for large models with cheap computations, as we have a dedicated thread for sampling, that doesn't also have to deal with storing data in the trace. For very small models it might be a bit slower, as we do have some small constant overhead for sending messages.

Since I use a couple of py3 only features, the old code is still used if py<3

TODO

Add warnings (they are not working yet)
Fix progress bar
Better formatting of exceptions
Test on windows
Optional, improve exception printing in jupyter via set_custom_exc

fonnesbeck · 2018-06-07T17:00:30Z

I was under the impression that its simpler to use concurrent.futures instead of multiprocessing. Not true?

aseyboldt · 2018-06-07T17:07:23Z

For computations that don't need to communicate with the main process and that are never aborted I think that is true. But we can profit quite a bit if we keep talking to the main process in our use case. This makes status updates (progress bar), writing results to files and interrupting the sampler much easier. And that is not supported well by concurrent.futures.

aseyboldt · 2018-06-08T12:12:36Z

Another nice bonus: We can return a partial trace now if parallel sampling is interrupted with a KeyboardInterrupt.

twiecki · 2018-06-14T08:07:31Z

One other option to entertain is that since we plan to drop Python 2 support anyhow, we already start by deprecating parallel sampling for python 2. I suppose it's not really worth it because we have something that sorta works and we would just take that away for seemingly no good reason. Yet, the aggregate cost of dealing with it is quite high.

junpenglao · 2018-06-14T08:50:31Z

What is our timeline on py27?

aseyboldt · 2018-06-14T09:41:19Z

There are still some tests on py2.7/float32 that are failing (due to issues with sqlite and text backends), but I think that is just the unrelated #3018. So this is ready for review/merge.

junpenglao · 2018-06-14T09:44:18Z

This is the one progress bar version right (ie, no snail race)?

aseyboldt · 2018-06-14T09:45:22Z

Yes

junpenglao · 2018-06-14T09:58:52Z

pymc3/step_methods/arraystep.py

+            shape_dtypes[var.name] = (shape, dtype)
+        return shape_dtypes
+
+    def stop_tuning(self):


Oh this is much better.

junpenglao

LGTM, need release note.

hvasbath · 2018-06-16T11:53:26Z

pymc3/parallel_sampling.py

+
+        self._progress = None
+        if progressbar:
+            self._progress = tqdm_(


you have to add the position argument here in order to not have tqdms interfering with each other

There is only one progress bar now, but that progress bar counts samples from all chains.

aseyboldt added 2 commits June 7, 2018 16:21

Rewrite of multiprocessing code

85658d7

Use tqdm_notebook

08603a4

aseyboldt force-pushed the multiproc branch from 4239066 to 08603a4 Compare June 7, 2018 14:21

aseyboldt added 2 commits June 7, 2018 16:53

Sample lower chains first

51f1c8b

Copy shared memory before yielding sampling

f122872

Return partial traces if sampling is interrupted

dd21cc4

Fix tests

03b10de

aseyboldt force-pushed the multiproc branch 2 times, most recently from ec02da7 to c6876c8 Compare June 12, 2018 18:25

aseyboldt added 2 commits June 13, 2018 10:01

Add warnings in new multiprocessing sampling

212ff07

Show one progress bar for all chains

ebb3b3e

aseyboldt force-pushed the multiproc branch from c6876c8 to 71222d2 Compare June 13, 2018 08:01

Better remote exception formatting

145856d

aseyboldt force-pushed the multiproc branch from 71222d2 to 145856d Compare June 13, 2018 11:08

Make posterior tests more stable

c7b43b4

junpenglao reviewed Jun 14, 2018

View reviewed changes

Add release notes for multiproc rewrite

ae1025b

aseyboldt force-pushed the multiproc branch from eb8d2d8 to ae1025b Compare June 14, 2018 10:09

twiecki merged commit 9898e00 into pymc-devs:master Jun 14, 2018

junpenglao mentioned this pull request Jun 14, 2018

New parallel cannot sample more chain than n_core #3028

Closed

hvasbath reviewed Jun 16, 2018

View reviewed changes

junpenglao mentioned this pull request Jun 18, 2018

Multiprocessing failure: 'Can't pickle local object' #3034

Closed

JackCaster mentioned this pull request Aug 7, 2018

Multiprocessing fails when sampling multiple chains using multiple cores #3140

Closed

MarcoGorelli mentioned this pull request Dec 9, 2020

Fix bug whereby partial traces have fewer draws than would be available #4318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite parallel sampling using multiprocessing #3011

Rewrite parallel sampling using multiprocessing #3011

aseyboldt commented Jun 7, 2018 •

edited

Loading

fonnesbeck commented Jun 7, 2018

aseyboldt commented Jun 7, 2018

aseyboldt commented Jun 8, 2018

twiecki commented Jun 14, 2018

junpenglao commented Jun 14, 2018

aseyboldt commented Jun 14, 2018

junpenglao commented Jun 14, 2018

aseyboldt commented Jun 14, 2018

junpenglao Jun 14, 2018

junpenglao left a comment

hvasbath Jun 16, 2018

aseyboldt Jun 16, 2018

Rewrite parallel sampling using multiprocessing #3011

Rewrite parallel sampling using multiprocessing #3011

Conversation

aseyboldt commented Jun 7, 2018 • edited Loading

TODO

fonnesbeck commented Jun 7, 2018

aseyboldt commented Jun 7, 2018

aseyboldt commented Jun 8, 2018

twiecki commented Jun 14, 2018

junpenglao commented Jun 14, 2018

aseyboldt commented Jun 14, 2018

junpenglao commented Jun 14, 2018

aseyboldt commented Jun 14, 2018

junpenglao Jun 14, 2018

Choose a reason for hiding this comment

junpenglao left a comment

Choose a reason for hiding this comment

hvasbath Jun 16, 2018

Choose a reason for hiding this comment

aseyboldt Jun 16, 2018

Choose a reason for hiding this comment

aseyboldt commented Jun 7, 2018 •

edited

Loading