Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Important Distribution-to-RandomVariable logic changes #4463

Closed
13 tasks done
brandonwillard opened this issue Feb 5, 2021 · 10 comments
Closed
13 tasks done

Important Distribution-to-RandomVariable logic changes #4463

brandonwillard opened this issue Feb 5, 2021 · 10 comments

Comments

@brandonwillard
Copy link
Contributor

brandonwillard commented Feb 5, 2021

PyMC's Distribution classes are being converted to RandomVariables in the v4 branch. A lot of core changes have already been made but a handful of important logic changes are still required in order to reinstate multiple PyMC features. This issue lists some points in the codebase where these changes are needed.

First, for anyone who's interested in helping out with the v4 branch, searching for the string XXX: will generally reveal parts of the logic that have been disabled and need to be refactored.

Here is a potentially outdated list of those parts accompanied by short summaries of the problem(s) and the work involved:

  • pymc3.model.Model.register_rv
    • I don't know what the whole dict-as-observed-data thing is about, so I couldn't finish refactoring this.
    • Marking as stale
  • pymc3.distributions.transforms.TransformedDistribution
    • The forward_val methods use draw_values, which has been removed. I think these methods can be removed entirely, because they only appear to be used by pymc3.sampling.sample_prior_predictive and I don't think that logic is relevant any longer.
    • Marking as stale
  • pymc3.gp.gp.[Marginal, MarginalKron]
    • These also use draw_values, and I think they only need to be replaced by the creation and use of theano.function. For example, draw_values([mu, cov], point=point) roughly translates to something like theano.function([model[n] for n in point.keys()], [mu, cov])(*point.values()), but we wouldn't want to compile that function every time a sample is needed, so we would need to do self.mu_cov_fn = theano.function([model[n] for n in point.keys()], [mu, cov]) somewhere (e.g. the constructor) and reuse self.mu_cov_fn in those methods.
    • Addressed by PR v4 refactor for GP module #5055
  • pymc3.parallel_sampling._Process
    • The whole parallel sampling approach relies on a single set of fixed-shape shared memory arrays for each random variable, and the processes all appear to write to that set of memory. Unfortunately, that approach doesn't work when the random variables change shape during sampling. This doesn't need to be fixed immediately, but it is an unnecessary restriction caused by this specific approach to multi-processing.
    • Reminder issue: Allow step methods to work with variables that have dynamic sizes #5139
  • pymc3.variational.opvi.Group
    • The constructor for this class creates aDictToArrayBijection in a peculiar way and requires explicit shape information to do it. Just like all the other changes of this sort, we need to move the creation of the bijection(s) to the places where actual concrete samples are generated (and said bijection(s) are actually used/needed). I don't know enough about this code to do that, so someone who has worked on this should take a look.
    • Addressed by PR Make VI work on v4 #4582
  • pymc3.sampling.sample_posterior_predictive_w
  • pymc3.step_methods.elliptical_slice.EllipticalSlice.astep
  • pymc3.step_methods.metropolis
    • The first instance uses the old dsize field on the random variables to create a NumPy array that—in turn—is used to initialize the proposal distributions. Instead, we should use the initial sample values for each variable to do this, and perhaps update the proposal distributions when/if new samples are drawn that have new shapes.
    • As always, for all these cases we can assume that the sizes of the initial values are representative of all the future samples (i.e. shapes don't change), but we shouldn't if that assumption isn't necessary.
    • Besides the size/shape refactoring, there's also a draw_values that looks like it might be straightforward to fix.
    • Fixed in: 28fdda5
  • pymc3.step_methods.hmc.base_hmc.BaseHMC.__init__
  • pymc3.step_methods.gibbs.ElemwiseCategorical
    • Yet another use of the deprecated dshape property that can always be replaced by using the shapes of the initial sample point (but shouldn't, if avoidable).
    • Stepper was removed in #97dc4bd
  • pymc3.step_methods.sgmcmc.BaseStochasticGradient
  • pymc3.data
    • This code attempts to evaluate the shape, but I think the shape is necessarily for a (shared) constant variable, so it might not need to be changed, but it's weird nonetheless.
    • Marking as stale
  • pymc3.model_graph.ModelGraph
    • This is another use of dshape; however, this one might not need to be changed, since the whole class could be replaced by the sample-space graphs provided by a Model object. The only unique feature I can immediately see is the Graphviz plate notation. Theano graphs, like the sample-space graphs, can already be converted to Graphviz Digraphs using functionality that's been in Theano for a while, but the plate notation may be a new thing that requires changes/additions.
    • Refactored in Refactor ModelGraph for v4 #4818

Originally posted by @brandonwillard in #4440 (comment)

@ricardoV94
Copy link
Member

Instead, we should use the initial sample values for each variable to do this, and perhaps update the proposal distributions when/if new samples are drawn that have new shapes.

Can the Metropolis acceptance ratio be calculated between proposals with different shapes?

@brandonwillard
Copy link
Contributor Author

brandonwillard commented Feb 6, 2021

Can the Metropolis acceptance ratio be calculated between proposals with different shapes?

Yes, but we can't delve into that here.

@ricardoV94
Copy link
Member

ricardoV94 commented Feb 16, 2021

I am checking the V4 branch and this looks like an issue. When generating logpt graphs for model variables it seems that tag.value_var is only being replaced for the parent / original variables but not those that use it downstream. In this example the __logp_y still depends on the RandomStateSharedVariable of x, even though __logp_x does not.

from aesara.printing import debugprint as dprint
import pymc3 as pm

with pm.Model() as m:
    x = pm.Uniform('x', lower=0, upper=1)
    y = pm.Uniform('y', lower=0, upper=x)
    
dprint(m.logpt)
Sum{acc_dtype=float64} [id A] '__logp'   
 |MakeVector{dtype='float64'} [id B] ''   
   |Sum{acc_dtype=float64} [id C] ''   
   | |Sum{acc_dtype=float64} [id D] ''   
   |   |Elemwise{mul,no_inplace} [id E] '__logp_x'   
   |     |Elemwise{switch,no_inplace} [id F] ''   
   |     | |Elemwise{mul,no_inplace} [id G] ''   
   |     | | |Elemwise{mul,no_inplace} [id H] ''   
   |     | | | |TensorConstant{1} [id I]
   |     | | | |Elemwise{mul,no_inplace} [id J] ''   
   |     | | |   |TensorConstant{1} [id K]
   |     | | |   |Elemwise{ge,no_inplace} [id L] ''   
   |     | | |     |x [id M]
   |     | | |     |TensorConstant{0.0} [id N]
   |     | | |Elemwise{mul,no_inplace} [id O] ''   
   |     | |   |TensorConstant{1} [id P]
   |     | |   |Elemwise{le,no_inplace} [id Q] ''   
   |     | |     |x [id M]
   |     | |     |TensorConstant{1.0} [id R]
   |     | |Elemwise{neg,no_inplace} [id S] ''   
   |     | | |Elemwise{log,no_inplace} [id T] ''   
   |     | |   |Elemwise{sub,no_inplace} [id U] ''   
   |     | |     |TensorConstant{1.0} [id R]
   |     | |     |TensorConstant{0.0} [id N]
   |     | |TensorConstant{-inf} [id V]
   |     |TensorConstant{1.0} [id W]
   |Sum{acc_dtype=float64} [id X] ''   
     |Sum{acc_dtype=float64} [id Y] ''   
       |Elemwise{mul,no_inplace} [id Z] '__logp_y'   
         |Elemwise{switch,no_inplace} [id BA] ''   
         | |Elemwise{mul,no_inplace} [id BB] ''   
         | | |Elemwise{mul,no_inplace} [id BC] ''   
         | | | |TensorConstant{1} [id BD]
         | | | |Elemwise{mul,no_inplace} [id BE] ''   
         | | |   |TensorConstant{1} [id BF]
         | | |   |Elemwise{ge,no_inplace} [id BG] ''   
         | | |     |y [id BH]
         | | |     |TensorConstant{0.0} [id BI]
         | | |Elemwise{mul,no_inplace} [id BJ] ''   
         | |   |TensorConstant{1} [id BK]
         | |   |Elemwise{le,no_inplace} [id BL] ''   
         | |     |y [id BH]
         | |     |uniform_rv.1 [id BM] 'x'   
         | |       |RandomStateSharedVariable(<RandomState(MT19937) at 0x7FF7A7954C40>) [id BN]
         | |       |TensorConstant{[]} [id BO]
         | |       |TensorConstant{11} [id BP]
         | |       |TensorConstant{0.0} [id N]
         | |       |TensorConstant{1.0} [id R]
         | |Elemwise{neg,no_inplace} [id BQ] ''   
         | | |Elemwise{log,no_inplace} [id BR] ''   
         | |   |Elemwise{sub,no_inplace} [id BS] ''   
         | |     |uniform_rv.1 [id BM] 'x'   
         | |     |TensorConstant{0.0} [id BI]
         | |TensorConstant{-inf} [id BT]
         |TensorConstant{1.0} [id BU]

As a consequence m.logp evaluations are stochastic:

m.logp({x: .9, y:.3})
array(0.26530097)

m.logp({x: .9, y:.3})
array(0.59186006)

@brandonwillard
Copy link
Contributor Author

brandonwillard commented Feb 16, 2021

The random variables definitely shouldn't be in there, but I think this case is covered by some of my unfinished changes to logpt (e.g. a simple rewrite step that replaces random variables with their value variables). Those changes are tied to other changes needed in order to support missing data, but I can try to decouple them.

@brandonwillard
Copy link
Contributor Author

I pushed a change recently should should've fixed that issue in the log-likelihoods.

@ricardoV94
Copy link
Member

Another issue: The model logp is being constructed in terms of the untransformed variables.

master branch:

with pm.Model() as m: 
  x = pm.Uniform('x', 0, 1)

m.logp({'x': -1})  # TypeError: Missing required input: x_interval__ ~ TransformedDistribution

m.logp({'x_interval__': -1})                                            
array(-1.62652338)

V4 branch:

with pm.Model() as m: 
  x = pm.Uniform('x', 0, 1)

m.logp({'x': -1})
array(-inf)

m.logp({'x_interval__': -1})  # TypeError: Missing required input: x                                      

Which means NUTS is pretty much hopeless at the moment:

with m:
    trace = pm.sample(compute_convergence_checks=False)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [x]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4 seconds.
There were 997 divergences after tuning. Increase `target_accept` or reparameterize.
There were 999 divergences after tuning. Increase `target_accept` or reparameterize.
The chain contains only diverging samples. The model is probably misspecified.
The chain contains only diverging samples. The model is probably misspecified.

@ricardoV94
Copy link
Member

ricardoV94 commented Feb 24, 2021

By the way it would be really nice to be able to keep generating the logp (under another method name) in terms of untransformed variables. This way the graphs can be used directly for other things that do not require / benefit from transformed variables (e.g., grid approximation)

@brandonwillard
Copy link
Contributor Author

The model logp is being constructed in terms of the untransformed variables.

Ah, yes, I was still in process of determining how to handle transformations. I started by setting things up so that everything would work exactly as it currently does (i.e. always use the transformed variables when producing log-likelihoods), but, since it's just as easy to apply transforms to an existing log-likelihood graph—and it provides much more opportunity and flexibility—I stopped short of doing that.

I'll add the transform stuff shortly.

@ricardoV94
Copy link
Member

I checked bullet points that I know were already addressed / are stale / or have specific issues referring to it. We should revisit the open bullet points and check whether they still need to be worked on

@ricardoV94
Copy link
Member

Closing this, as all points have been addressed, either because they are stale, solved, or because we have more specific reminder issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants