-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid logp expression if args aren't explicit in DensityDist #5155
Comments
Hmm... but we do allow for models with RVs (when using Simulator variables)... We could check that they do not belong to the |
This is a bit of weird case, because we try to be very flexible with the |
We could perhaps inspect the logp function signature inside |
Checking the signature can't help, can it? The variables in the closure wouldn't be picked up by that. Maybe this comes down to the api for |
I understand the concern of having RandomVariable in the graph but the Densitydist is not a good example of this. In our bug we had a global variable that was introduced in the graph when requesting the logp but which happened to be part of the original graph already. This is not the case we discussed where we fail to convert a RV or a RV input. We didn't convert a variable that was not part of the original "graph” as it was not used as an input to the Densitydist. It really reads like an edge case and that it's the Densitydist API that's at fault here. |
The Simulator introduces new RVs in the logp so we would need special logic to say the graph of the simulator can contain RVs but other variables cannot.
|
Ugh I see now the original example had It's unfortunate that this worked in V3 like that (it should never have worked) and does not fail immediately in V4... I still don't think/ see how aeppl or pm.logp should be policing that the output of logp dispatching does not introduce new RandomVariables by accident (and ignore when they are allowed). Right now Simulator only introduces RVs of type SimulatorRV, but in the future it might return a subgraph composed of vanilla RVs. How would we tell pm.logp that those can be accepted but others cannot? We also don't know in advance what these RVs will be so we cannot inform pm.logp about it. |
I think we could change the signature of def logp(
vals: List[RandomVariable],
rv_values: Dict[RandomVariable, ValueVariable],
sampling_rvs: Optional[List[RandomVariable]]
) so that I can't think of a use case with SimulatorRVs, where we would call the logp function without knowing from the info in the model, which variables are supposed to be sampled. (Explicit is better than implicit :-) ) |
The Simulator can introduce new RVs that did not exist in the model prior to logp. Right now it does introduce a new SimulatorRV that is completely absent from the original graph (it's a clone of itself). The SimulatorRV itself is a non symbolic wrapper around RVs, but in the future it might return a purely symbolic random expression: Example There is no way to tell pm.logp to expect a new NormalRV in the logp graph. It did not exist before. By the way, right now it works something like: In either case there is a brand new RV that did not exist before we called pm.logp |
I don't get it yet. Who calls |
Inside pm.logp: pymc/pymc/distributions/simulator.py Line 243 in bdd4d19
|
But couldn't pm.logp keep track of the created rv? We give it a list of simulatorRvs when we call it, it creates corresponding sim_values and later checks that only those sim_values are in the graph? |
Ah, wait it happens in the method out of reach of pm.logp... |
I see two not very pleasing ways to patch things for now:
I am more inclined to 2 because it is more contained... |
I don't think this is just about the densitydist. I'm afraid that bugs like this, where some sampling remains in the logp graph could also come about in different ways that we don't expect. And I think this can lead to really really terrible bugs, where you get anything from models that just silently return incorrect results to almost impossible to debug convergence problems. How about one of those instead?
|
This happened because of a global defined variable used by accident inside the logp + a class that is defined dynamically + forgiving DensityDist API + incorrect API use + cleverness of cloudpickle. If you define a proper RandomVariable where you have to be explicit about the number of inputs (we try to infer those in the DensityDist), this would not happen because it would raise as soon as you define the variable in the Model. Otherwise you are suggesting we should be skeptical of how aeppl works because it is fundamentally "fragile", and I think that warrants some evidence beyond the DensityDist case in this issue. RandomVariables are conceptually as valid an Op as any other you might find in a logp graph, including SharedVariables with autoupdates which would also render the logp graph non deterministic. If you want to reject RandomVariables you must also reject any SharedVariables in a logp graph. |
Do you maybe have a couple of minutes for a call today? Maybe it's easier to discuss it that way? :-) |
Description of your problem
On the latest pymc version (777622a) I get incorret logp graphs that contain sampled random variables if I use model variables in a closure of a density dist:
The logp function of the model is not deterministic now, because it uses a sampled version of
a
in the logp function ofb
instead of the value from the value variable. You can see this in the aesara graph (the normal_rv op should not be in here):This can be fixed in user code by explicitly letting the DensityDist know about the parameter:
Finding a bug like this in an actual model took @ferrine and me a couple of hours, so if we could either change
pm.logp
so that this just works or so that we get an error if there are remaining rv ops in a logp graph that would help a lot.(cc @ricardoV94)
The text was updated successfully, but these errors were encountered: