Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid logp expression if args aren't explicit in DensityDist #5155

Closed
aseyboldt opened this issue Nov 8, 2021 · 17 comments · Fixed by #5614
Closed

Invalid logp expression if args aren't explicit in DensityDist #5155

aseyboldt opened this issue Nov 8, 2021 · 17 comments · Fixed by #5614

Comments

@aseyboldt
Copy link
Member

Description of your problem

On the latest pymc version (777622a) I get incorret logp graphs that contain sampled random variables if I use model variables in a closure of a density dist:

import numpy as np
import pymc as pm

with pm.Model() as model:
    a = pm.Normal("a")
    pm.DensityDist("b", logp=lambda x: (x - a) ** 2, observed=np.array(3.))

The logp function of the model is not deterministic now, because it uses a sampled version of a in the logp function of b instead of the value from the value variable. You can see this in the aesara graph (the normal_rv op should not be in here):

aesara.dprint(model.logpt)
Elemwise{add,no_inplace} [id A] '__logp'   
 |Elemwise{add,no_inplace} [id B] ''   
 | |Sum{acc_dtype=float64} [id C] ''   
 | | |TensorConstant{[]} [id D]
 | |Sum{acc_dtype=float64} [id E] ''   
 |   |Elemwise{mul,no_inplace} [id F] ''   
 |     |Assert{msg='sigma > 0'} [id G] 'a_logprob'   
 |     | |Elemwise{sub,no_inplace} [id H] ''   
 |     | | |Elemwise{sub,no_inplace} [id I] ''   
 |     | | | |Elemwise{mul,no_inplace} [id J] ''   
 |     | | | | |TensorConstant{-0.5} [id K]
 |     | | | | |Elemwise{pow,no_inplace} [id L] ''   
 |     | | | |   |Elemwise{true_div,no_inplace} [id M] ''   
 |     | | | |   | |Elemwise{sub,no_inplace} [id N] ''   
 |     | | | |   | | |a [id O]
 |     | | | |   | | |TensorConstant{0} [id P]
 |     | | | |   | |TensorConstant{1.0} [id Q]
 |     | | | |   |TensorConstant{2} [id R]
 |     | | | |Elemwise{log,no_inplace} [id S] ''   
 |     | | |   |TensorConstant{2.5066282746310002} [id T]
 |     | | |Elemwise{log,no_inplace} [id U] ''   
 |     | |   |TensorConstant{1.0} [id Q]
 |     | |All [id V] ''   
 |     |   |Elemwise{gt,no_inplace} [id W] ''   
 |     |     |TensorConstant{1.0} [id Q]
 |     |     |TensorConstant{0.0} [id X]
 |     |TensorConstant{1.0} [id Y]
 |Sum{acc_dtype=float64} [id Z] ''   
   |Elemwise{mul,no_inplace} [id BA] ''   
     |Elemwise{pow,no_inplace} [id BB] 'b_logprob'   
     | |Elemwise{sub,no_inplace} [id BC] ''   
     | | |TensorConstant{3.0} [id BD]
     | | |normal_rv{0, (0, 0), floatX, False}.1 [id BE] 'a'   
     | |   |RandomStateSharedVariable(<RandomState(MT19937) at 0x7F660912BC40>) [id BF]
     | |   |TensorConstant{[]} [id BG]
     | |   |TensorConstant{11} [id BH]
     | |   |TensorConstant{0} [id BI]
     | |   |TensorConstant{1.0} [id BJ]
     | |TensorConstant{2} [id BK]
     |TensorConstant{1.0} [id BL]

This can be fixed in user code by explicitly letting the DensityDist know about the parameter:

with pm.Model() as model2:
    a = pm.Normal("a")
    pm.DensityDist("b", a, logp=lambda x, a_val: (x - a_val) ** 2, observed=np.array(3.))

Finding a bug like this in an actual model took @ferrine and me a couple of hours, so if we could either change pm.logp so that this just works or so that we get an error if there are remaining rv ops in a logp graph that would help a lot.

(cc @ricardoV94)

@ricardoV94
Copy link
Member

Hmm... but we do allow for models with RVs (when using Simulator variables)... We could check that they do not belong to the NoDistribution class but I am not sure that's the best approach.

@ricardoV94
Copy link
Member

ricardoV94 commented Nov 8, 2021

This is a bit of weird case, because we try to be very flexible with the Densitydist... including inferring how many inputs the variable takes based on the function signature. If the user had to specify explicitly that the distribution expects one parameter, then it would also have failed earlier when creating the RV. In fact you can specify this by passing ndims_params=[0], I think.

@ricardoV94
Copy link
Member

We could perhaps inspect the logp function signature inside DensityDist and confirm that the user passed the expected number of inputs?

@aseyboldt
Copy link
Member Author

Checking the signature can't help, can it? The variables in the closure wouldn't be picked up by that.

Maybe this comes down to the api for pm.logp. During the retreat we discussed what should happen if some rvs aren't mentioned. If you always had to explicitly mention all rvs (either in the value var mapping or in an additional list of vars that should be sampled) this wouldn't happen, would it? And in a simulation context we'd know which variables should remain beforehand, so we can tell the pm.logp function.
So if anything is left that isn't explicitly marked as a sample rv, we'd just throw an error. That error message probably wouldn't be all that nice, but any error message is better than incorrect sampling.

@ricardoV94
Copy link
Member

ricardoV94 commented Nov 9, 2021

Checking the signature can't help, can it? The variables in the closure wouldn't be picked up by that.

The signature of the logp function or the lamba tells you how many variables are expected. This is number of arguments + 1 for the value variable. So in your case we would have noticed the lambda takes 2 arguments but you didn't pass any variable to Densitydist and raise an error. That would have been enough.

I understand the concern of having RandomVariable in the graph but the Densitydist is not a good example of this.

In our bug we had a global variable that was introduced in the graph when requesting the logp but which happened to be part of the original graph already. This is not the case we discussed where we fail to convert a RV or a RV input. We didn't convert a variable that was not part of the original "graph” as it was not used as an input to the Densitydist.

It really reads like an edge case and that it's the Densitydist API that's at fault here.

@ricardoV94
Copy link
Member

ricardoV94 commented Nov 9, 2021

And in a simulation context we'd know which variables should remain beforehand, so we can tell the pm.logp function

The Simulator introduces new RVs in the logp so we would need special logic to say the graph of the simulator can contain RVs but other variables cannot.

We can also raise a warning later in compile_rv_inplace where we compile the logp function. We already have to traverse the logp graph there to manage simulator RVs, so we are already paying the cost anyway. That's not good because we use that method to compile predictive sampling as well

@ricardoV94
Copy link
Member

ricardoV94 commented Nov 9, 2021

Checking the signature can't help, can it? The variables in the closure wouldn't be picked up by that.

Ugh I see now the original example had x as the only input. Hmm... that is a thorny case. I have no good suggestion then.

It's unfortunate that this worked in V3 like that (it should never have worked) and does not fail immediately in V4...

I still don't think/ see how aeppl or pm.logp should be policing that the output of logp dispatching does not introduce new RandomVariables by accident (and ignore when they are allowed).

Right now Simulator only introduces RVs of type SimulatorRV, but in the future it might return a subgraph composed of vanilla RVs. How would we tell pm.logp that those can be accepted but others cannot?

We also don't know in advance what these RVs will be so we cannot inform pm.logp about it.

@aseyboldt
Copy link
Member Author

I think we could change the signature of pm.logp so that anyone calling it has to account for all random variables in the graph somehow. I think I proposed this at the retreat as well. So something like

def logp(
    vals: List[RandomVariable],
    rv_values: Dict[RandomVariable, ValueVariable],
    sampling_rvs: Optional[List[RandomVariable]]
)

so that vals + sampling_rvs has to cover all random variables in the graph. If they don't, we throw an exception, and we can also do a second pass in the final logp graph to make sure that it only contains rvs that are in sampling_rvs.

I can't think of a use case with SimulatorRVs, where we would call the logp function without knowing from the info in the model, which variables are supposed to be sampled. (Explicit is better than implicit :-) )

@ricardoV94
Copy link
Member

ricardoV94 commented Nov 9, 2021

I can't think of a use case with SimulatorRVs, where we would call the logp function without knowing from the info in the model, which variables are supposed to be sampled. (Explicit is better than implicit :-) )

The Simulator can introduce new RVs that did not exist in the model prior to logp. Right now it does introduce a new SimulatorRV that is completely absent from the original graph (it's a clone of itself). The SimulatorRV itself is a non symbolic wrapper around RVs, but in the future it might return a purely symbolic random expression:

Example pm.logp(SimRV, value) -> at.random.normal(SimRV.owner.inputs[-1], 1) - value

There is no way to tell pm.logp to expect a new NormalRV in the logp graph. It did not exist before.

By the way, right now it works something like:
pm.logp(SimRV, value) -> SimRV.owner.op(*Sim RV.owner.inputs) - value

In either case there is a brand new RV that did not exist before we called pm.logp

@aseyboldt
Copy link
Member Author

I don't get it yet. Who calls clone or creates the new rv? Inside pm.logp or before that?

@ricardoV94
Copy link
Member

I don't get it yet. Who calls clone or creates the new rv? Inside pm.logp or before that?

Inside pm.logp:

sim_value = sim_op.make_node(rng, *sim_inputs[1:]).default_output()

@aseyboldt
Copy link
Member Author

But couldn't pm.logp keep track of the created rv? We give it a list of simulatorRvs when we call it, it creates corresponding sim_values and later checks that only those sim_values are in the graph?

@aseyboldt
Copy link
Member Author

Ah, wait it happens in the method out of reach of pm.logp...

@ricardoV94
Copy link
Member

I see two not very pleasing ways to patch things for now:

  1. Only allow SimulatorRVs / NoDistributions in the logp graph
  2. Force users to always specify the number of parameters in DensityDist

I am more inclined to 2 because it is more contained...

@aseyboldt
Copy link
Member Author

I don't think this is just about the densitydist. I'm afraid that bugs like this, where some sampling remains in the logp graph could also come about in different ways that we don't expect. And I think this can lead to really really terrible bugs, where you get anything from models that just silently return incorrect results to almost impossible to debug convergence problems.

How about one of those instead?

  • We change the simulator logp so that it adds some tag to the created rv (derived_from or so) that still allows us to check the logp graph for rvs in the graph that shouldn't be there.
  • Leave most of it as it is, but add an allow_rvs_in_graph flag to pm.logp, that you have to set to True if you want to do ABC, but that is False by default (99% of cases).

@ricardoV94
Copy link
Member

ricardoV94 commented Nov 9, 2021

I don't think this is just about the densitydist. I'm afraid that bugs like this, where some sampling remains in the logp graph could also come about in different ways that we don't expect. And I think this can lead to really really terrible bugs, where you get anything from models that just silently return incorrect results to almost impossible to debug convergence problems.

This happened because of a global defined variable used by accident inside the logp + a class that is defined dynamically + forgiving DensityDist API + incorrect API use + cleverness of cloudpickle. If you define a proper RandomVariable where you have to be explicit about the number of inputs (we try to infer those in the DensityDist), this would not happen because it would raise as soon as you define the variable in the Model.

Otherwise you are suggesting we should be skeptical of how aeppl works because it is fundamentally "fragile", and I think that warrants some evidence beyond the DensityDist case in this issue.

RandomVariables are conceptually as valid an Op as any other you might find in a logp graph, including SharedVariables with autoupdates which would also render the logp graph non deterministic. If you want to reject RandomVariables you must also reject any SharedVariables in a logp graph.

@aseyboldt
Copy link
Member Author

Do you maybe have a couple of minutes for a call today? Maybe it's easier to discuss it that way? :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants