Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dill to serialize logp functions in DensityDist #4053

Merged
merged 5 commits into from
Aug 16, 2020

Conversation

aseyboldt
Copy link
Member

@aseyboldt aseyboldt commented Aug 15, 2020

Fix #3844
When users pass functions to pm.DensityDist, then in most cases pickle will not be able to deal with those functions. The code will still work on Linux, because the default multiprocessing context is fork, and we don't even have to pickle the logp function. On Windows and Mac the default is spawn or forkserver however, and code using pm.DensityDist with a logp function that is not accessible in a module or protected by if __name__ == '__main__ will fail. This includes all functions defined in a notebook as far as I understand.
We can work around this issue by manually serializing the logp function with dill.

This PR introduces dill as a new required dependency to PyMC3, before it was optional.

Edit: I had some trouble with the cache of miniconda, so I also included a fix so that the cached environment is updated.

@aseyboldt aseyboldt force-pushed the fix-density-dist-pickle branch 2 times, most recently from 0da0c00 to 2ec28db Compare August 15, 2020 14:39
@aseyboldt aseyboldt force-pushed the fix-density-dist-pickle branch from 2ec28db to a92ce54 Compare August 15, 2020 14:44
@codecov
Copy link

codecov bot commented Aug 15, 2020

Codecov Report

Merging #4053 into master will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4053      +/-   ##
==========================================
+ Coverage   86.82%   86.83%   +0.01%     
==========================================
  Files          88       88              
  Lines       14154    14165      +11     
==========================================
+ Hits        12289    12300      +11     
  Misses       1865     1865              
Impacted Files Coverage Δ
pymc3/distributions/distribution.py 94.61% <100.00%> (+0.14%) ⬆️
pymc3/variational/stein.py 100.00% <0.00%> (ø)
pymc3/model.py 88.95% <0.00%> (+0.01%) ⬆️

RELEASE-NOTES.md Outdated
@@ -5,6 +5,7 @@
### Maintenance
- Mentioned the way to do any random walk with `theano.tensor.cumsum()` in `GaussianRandomWalk` docstrings (see [#4048](https://github.com/pymc-devs/pymc3/pull/4048)).
- Fixed numerical instability in ExGaussian's logp by preventing `logpow` from returning `-inf` (see [#4050](https://github.com/pymc-devs/pymc3/pull/4050)).
- Use dill to serialize user defined logp functions in `DensityDist`. `dill` is now a required dependency. (see [#3844](https://github.com/pymc-devs/pymc3/issues/3844)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Use dill to serialize user defined logp functions in `DensityDist`. `dill` is now a required dependency. (see [#3844](https://github.com/pymc-devs/pymc3/issues/3844)).
- Use dill to serialize user-defined logp functions in `DensityDist`. `dill` is now a required dependency. (see [#3844](https://github.com/pymc-devs/pymc3/issues/3844)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add what this allows what wasn't possible before. did pickling just throw an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -24,13 +24,15 @@ ENVNAME="${ENVNAME:-testenv}" # if no ENVNAME is specified, use testenv
if [ -z ${GLOBAL} ]; then
if conda env list | grep -q ${ENVNAME}; then
echo "Environment ${ENVNAME} already exists, keeping up to date"
source activate ${ENVNAME}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think new style is conda activate (or mamba).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't work right now, but according to this and this it should be fixed by sourcing the conda functions again:

source $CONDA_PREFIX/etc/profile.d/conda.sh

trying...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's brittle probably best to keep it the way you had then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it is now the way the conda docs specify it...
But I can of course change it back.

@aseyboldt aseyboldt force-pushed the fix-density-dist-pickle branch from f9b563d to 712e3fb Compare August 16, 2020 12:23
@twiecki twiecki merged commit e08ad0c into pymc-devs:master Aug 16, 2020
@twiecki
Copy link
Member

twiecki commented Aug 16, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel sampling hangs on macOS with Python 3.8 with pickling error
2 participants