-
-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Populate InferenceData with out-of-sample prediction results from PyMC3 predictive samples #983
Populate InferenceData with out-of-sample prediction results from PyMC3 predictive samples #983
Conversation
7604ace
to
6dd4c80
Compare
A problem I am having with this is the division of variables into Currently, what we do is we try to guess what's a constant value by looking at the variables and seeing if they are in the predictive trace (not constant), or if they are in the posterior trace or if they are in the observations. If not, then we assume that they are constant data. This really does not work, because the user, in PyMC3, can specify a set of variables of interest to be in the predictive trace (this defaults to the set of observed random variables). So membership in (or absence from) the trace is not a good way to determine anything about a random variable. This means, if the user is not interested in a random variable, so omits it from the predictive trace, I really think it would be better to simply tell the translator what is constant data, but I don't have a great plan for how to do this. So if anyone does, please LMK. |
dc99580
to
bcc44e5
Compare
Huge help from Brandon Willard of PyMC3 and Symbolic PyMC3 enabled me to more accurately determine which variables belong in |
This won't pass tests until PyMC3 is fixed. See pymc-devs/pymc#3763 |
f3a2b76
to
bd4980f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like all the improvements introduced here, from predictions to better constant constant data support. This is awesome work.
In addition to the comments below, I have one extra comment on API. In my personal case (which I think is quite common) I generate only one set of predictions per model and I generate it without thinning; in which case it makes sense to store all quantities in a single inference data object. Therefore I think it would be great to add an idata
or idata_orig
optional argument so that predictions_from_pymc3
adds the new inference data (containing only predictions and predictions constant data) to the object passed as idata
(this is done in my PR so it should not be much work).
The logic would be:
idata_orig=None
return new idata object (which may have thinned posterior)idata_orig!=None
return original idata object concatenated with predictions idata
if predictions is not None: | ||
get_from = predictions | ||
elif prior is not None: | ||
get_from = prior | ||
elif posterior_predictive is not None: | ||
get_from = posterior_predictive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the priority order with posterior predictive before prior: predictions
, posterior_predictive
, prior
.
This is because the default in pm.sample_posterior_predictive
is samples=None
which will end up with ndraw samples, however, for pm.sample_prior_predictive
the default is samples=500
which will generally not be equal to ndraws.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a tentative version of the above and will push it momentarily. Note that I have augmented the orig_idata
by hand instead of using concat
. LMK if you think concat
would be better.
arviz/data/io_pymc3.py
Outdated
*, | ||
prior=None, | ||
posterior_predictive=None, | ||
predictions=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having a predictions
argument here will be quite confusing.
I would only use predictions
argument in PyMC3Converter
class. Therefore it will not be visible to users in the from_pymc3
docs but it will still be available for internal usage in predictions_from_pymc3
(which would then use the class instead of from_pymc3
, the code is basically the same in both cases).
arviz/data/io_pymc3.py
Outdated
|
||
|
||
def predictions_from_pymc3( | ||
predictions, posterior_trace, model, coords=None, dims=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that posterior_trace
and model
should default to None
.
I think that the most general use case will be calling this from inside the model context and without any thinning, and in this case, the model can be obtained from the context and the trace is not really needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@OriolAbril Could you LMK how you see this lining up with your work on #794? Is #794 going to be be merged soon, so that I can rebase on top of it? Should this MR branch of off yours (I don't like that idea, because I think it will be error-prone)? Or do you think we could come up with some subset of #794 that could be merged to master that would be enough to harmonize my work with yours, but still allow you to keep working on yours separately? |
I think both need to be thoroughly reviewed, and #794 has been waiting for quite long so it can wait until this has been merged. I don't mind rebasing on top of this once merged, and I actually think that they will merge quite well, here there are nearly no modifications to sample stats/likelihood handling. To be extra sure though, are you planning on adding some tests here (once the functionality is decided) or in a future PR? We could also wait for the tests PR if it is not too long. Does this sound good @rpgoldman ? |
bd4980f
to
2806557
Compare
I'm not sure exactly how to improve the tests. In particular:
This was only fixed on 7 Jan 2020, and is not in the released version of PyMC3. |
@OriolAbril OK, this should have all the changes you requested. I'll check in with PyMC3 about maybe cutting a bug-fix release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good, thanks!
Minor note: I think the .pyi
should be excluded from the repository
arviz/data/io_pymc3.py
Outdated
idata_orig.predictions = new_idata.predictions | ||
idata_orig.predictions_constant_data = new_idata.predictions_constant_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use concat for two reasons: the first is that this would allow to use **kwargs
and pass them to concat
(which already has both inplace and copy arguments), the second is that this does not update the _groups
attribute (used to iterate over groups in several cases such as saving or printing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will change that and push.
@canyon289 is the expert, but maybe we can mark the failing test with some fixture in python 3.5 so CI build passes. |
@OriolAbril Why do you think the |
Basically ignorance about type hinting. I think I had never seen a |
In that case, I will leave it. If anyone else uses |
By the way, I was not able to run black on |
I fetched the code from this PR and managed to run black on it. Maybe you have an old version? |
Pretty sure not: I updated it after the first failure. I think you might be able to commit the reformatted file to this MR. If you can figure out how, please do. If you can't, please either email it to me, or message it to me on Slack, and I will. |
Managed to push to your branch the black formatting |
I'm going to assume that it's OK not to have log likelihoods in this case. The reasoning would be that if you have done out of sample predictions that require modification of the trace, then you should have another I just wanted to explain why this is happening.
|
b7438f4
to
925cdbd
Compare
I thought the only constraint is that if there are
I like both names, I don't have any strong opinion on this. Like this looks fine. |
Made a bunch of overloads to capture the behavior of concat() better. Format improvements, etc.
The order of priority in extracting the model was wrong -- explicit argument or model context must override extraction from the trace. Return value handling in predictions_from_pymc3() was wrong.
Incorrectly checked for sampling statistics, which are not supported (yet?) in out of sample predictions.
aa84cd8
to
495e417
Compare
OK, this all works for me, and if you check out PyMC3 at |
6767702
to
d81220b
Compare
d81220b
to
c3f5c1f
Compare
arviz/data/io_pymc3.py
Outdated
# random variable object ... | ||
Var = Any # pylint: disable=invalid-name | ||
|
||
# pylint: disable=line-too-long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the scope of this diasable? (mainly curiosity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the scope of this disable? (mainly curiosity)
I'm afraid I don't know as well as I should. I'm going to try to leave it out and see what happens. If it's not necessary (after black reformatting), or if it can be more tightly scoped, I will cut or mve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cut it out -- I think it was there because my editor didn't look at the right .pylintrc, and was holding me to 80 characters. Passes pylint here without it. Assuming it passes in CI, I think we are good.
@OriolAbril Assuming that this passes all the tests again, how should we merge it? The git history is cluttered, and I was thinking of, for example, using rebasing to merge together a bunch of the commits for lint and black compliance. How about I wait for this bout of tests to pass, then clean up the history, and then if it's still passing tomorrow with the cleaned up history, you or I can merge it? |
All merges in ArviZ are squash merges, so the PR history does not really matter. I can merge afterwards |
Oh, great. Then I'll skip the history editing. I will push my changelog entry then, and we're done! |
@OriolAbril Thanks! |
…C3 predictive samples (arviz-devs#983) Adds from_pymc3_predictions to add predictions and constant_data_predictions groups of inference data objects. Co-authored-by: Oriol Abril <oriol.abril.pla@gmail.com>
In this approach, unlike the one outlined in the schema, the
constant_data
contains the constant data used to generate the predictions not to generate the posterior_trace.This could be modified, but I took this decision because the posterior trace used to generate the predictions in general CANNOT be the same as the posterior trace created by pymc3.sample() -- variables whose shape depends on the shape of the constant data or the observations must be removed.