Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SymbolicDistribution and Censored distributions #5169

Merged
merged 4 commits into from
Jan 8, 2022

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Nov 10, 2021

With aesara-devs/aeppl#22 we can now compute the logprob of censored Distributions.

This PR adds those to PyMC:

import pymc as pm

with pm.Model(rng_seeder=2021) as m:
    x = pm.Normal.dist()
    y = pm.Censored('y', x, -.1, .1)

print([float(y.eval()) for _ in range(5)])
# [0.1, 0.01990278315435156, 0.1, -0.1, -0.1]

y_val = m.rvs_to_values[y]
for y_test_val in [-.2, -.1, -.09, 0, 0.09, .1, .2]:
    print(y_test_val, m.logp({y_val: y_test_val}))
# -0.2 -inf
# -0.1 -0.7761545942065803
# -0.09 -0.9229885332046727
# 0 -0.9189385332046727
# 0.09 -0.9229885332046727
# 0.1 -0.7761545942065803
# 0.2 -inf

A large chunk of this PR concerns how to accommodate aeppl derived distributions with PyMC API seamlessly. To do this I started by duplicating pymc.Distribution into pymc.DerivedDistribution and changing things as necessary. Hopefully we will be able to use the same class in the end or at least refactor large chunks that are shared...

This may not be the best way... Suggestions are welcome

What works:

  • Defining size/shape and observed
  • Evaluating model logp
  • Prior predictive sampling
  • Sampling
  • Posterior predictive sampling

TODO:

  • Transforms
  • Dims
  • String representation
  • Integration with shape_utils.maybe_resize
  • fix log_likelihood in InferenceDataConverter
  • get_moment
  • tests
  • DOCSTRINGS
  • Open issue in pymc-examples to update examples involving censoring

@codecov
Copy link

codecov bot commented Nov 17, 2021

Codecov Report

Merging #5169 (4fc7dc9) into main (600fe90) will increase coverage by 0.01%.
The diff coverage is 86.39%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5169      +/-   ##
==========================================
+ Coverage   79.08%   79.09%   +0.01%     
==========================================
  Files          87       88       +1     
  Lines       14394    14516     +122     
==========================================
+ Hits        11383    11482      +99     
- Misses       3011     3034      +23     
Impacted Files Coverage Δ
pymc/distributions/distribution.py 90.07% <77.14%> (-5.16%) ⬇️
pymc/distributions/censored.py 92.45% <92.45%> (ø)
pymc/distributions/__init__.py 100.00% <100.00%> (ø)
pymc/distributions/bound.py 100.00% <100.00%> (ø)
pymc/distributions/shape_utils.py 98.87% <100.00%> (+<0.01%) ⬆️
pymc/distributions/simulator.py 86.76% <100.00%> (ø)
pymc/util.py 76.10% <100.00%> (+1.26%) ⬆️
pymc/parallel_sampling.py 86.33% <0.00%> (-1.00%) ⬇️

@ricardoV94 ricardoV94 force-pushed the censored_distributions branch 4 times, most recently from ec92fed to ed7e2b0 Compare November 17, 2021 17:32
pymc/distributions/distribution.py Outdated Show resolved Hide resolved
@twiecki
Copy link
Member

twiecki commented Jan 3, 2022

@ricardoV94 This is becoming a blocker for time series, how much is left to do here?

@ricardoV94
Copy link
Member Author

I am getting back to this one this week!

@ricardoV94
Copy link
Member Author

In the meantime, suggestions for the base class name? DerivedDistribution was my first suggestion because Aeppl derives the logprob terms for these graphs, but there is not much "derivation" going on in PyMC. These are just symbolic graphs that somewhere have a RandomVariable as a node / input. Other ideas:

  1. CompositeDistribution
  2. SymbolicDistribution
  3. ... ?

@twiecki
Copy link
Member

twiecki commented Jan 3, 2022

CustomDistribution? Otherwise I like SymbolicDistribution.

@brandonwillard
Copy link
Contributor

brandonwillard commented Jan 3, 2022

What purpose are these *Distribution classes serving? Remember, the original Distribution class in v4 was just an all static method-based stub/placeholder for (temporary) backwards compatibility.

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jan 3, 2022

They serve the same goal the "standard" Distribution class is doing right now: basically manage all non distribution kwargs: shape/dims, transformed, observed; and behind the scenes magic things like value variables and rngs

@brandonwillard
Copy link
Contributor

brandonwillard commented Jan 3, 2022

They serve the same goal the "standard" Distribution class is doing right now: basically manage all non distribution kwargs: shape/dims, transformed, observed; and behind the scenes magic things like value variables and rngs

The base Distribution class was only ever supposed to make the hand-off to the Model class, which would then handle all the generic tasks (e.g. value variables, RNGs, shape/dims, etc.), and the rest was just for backward compatibility (e.g. the now completely unnecessary Disribution.dist interface). Notice how instances of these classes are never made, and that only static/class/type-level functions are used.

By extending Distribution and adding more logic to those classes, it could become considerably more difficult to unwind those temporary backward compatibility-only choices, and they'll eventually become a permanent and confusing part of v4's design.

This isn't the time or place to start addressing all that, but these are the kinds of design considerations that need to go alongside changes/additions to the relevant areas of code (i.e. Distribution-related code).

@ricardoV94 ricardoV94 force-pushed the censored_distributions branch 2 times, most recently from 2329885 to 785ee15 Compare January 4, 2022 12:28
@ricardoV94 ricardoV94 marked this pull request as ready for review January 4, 2022 12:28
@ricardoV94 ricardoV94 force-pushed the censored_distributions branch 2 times, most recently from 728d816 to 766ac8d Compare January 4, 2022 13:41
* Fix invalid code example in docstrings
* Rename distribution parameter to dist
* Use `check_dist_not_registered` helper
@ricardoV94
Copy link
Member Author

@brandonwillard do you have other pointers for this PR? It is currently blocked by your old requested changes concerning docstrings

@brandonwillard brandonwillard dismissed their stale review January 8, 2022 17:11

Request is stale

Copy link
Contributor

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brandonwillard do you have other pointers for this PR? It is currently blocked by your old requested changes concerning docstrings

Sorry, I dismissed the old review; otherwise, I don't have any more input for this PR right now.

@ricardoV94 ricardoV94 merged commit 664a447 into pymc-devs:main Jan 8, 2022
@ghost
Copy link

ghost commented Jan 8, 2022

Thanks @ricardoV94! Can't wait to try these out.

@twiecki twiecki changed the title Add Censored distributions Add SymbolicDistribution and Censored distributions Jan 9, 2022
@@ -354,11 +355,268 @@ def dist(
return rv_out


class SymbolicDistribution:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will those show up in the docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also didn't mention it in the release notes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They show in the docs (you can check the RTD job in this PR). I didn't see the SymbolicDist as being user facing, that's why it's not mentioned in the release notes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but it's such an important new addition that I think it deserves some more attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants