-
-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Generalized Extreme Value distribution example #241
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
examples/case_studies/GEV.ipynb
Outdated
@@ -0,0 +1,899 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should say something about the divergences. They probably arise from the hard constraint between the support and/or parameters, which NUTS struggles with.
Since the support domain changes depending on whether xi
is negative, zero or positive, there is not much we can do (at least nothing obvious). Ideally we would parametrize the distribution (or apply a transformation) in such way that any proposed value would be in the supported domain of the distribution. This would likely alleviate divergences...
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - this is done now, with some recommendations for the user.
Thanks for your PR! I have added the hold for v4 tag to make sure it is merged after the beta release is out. This will ensure the notebook is reproducible from the watermark only (pymc version doesn't include commit info) as well as not having the "You are running the v4 development version of PyMC which currently still lacks key features." in our documentation examples. Also make sure to follow the style guide: https://github.com/pymc-devs/pymc/wiki/PyMC-Jupyter-Notebook-Style-Guide. This will ensure the notebook is easily discoverable via tags, contains all the necessary info to rerun from the watermark and overall tries to set all the notebooks in the example gallery to a similar style so readers know what to expect. I also noticed you have not listed yourself as author, which is also something we are currently discussing to make sure we clearly credit everyone who contributes to the example gallery. See #198 for example if you are interested in this or https://pymc-examples--239.org.readthedocs.build/en/239/splines/spline.html (generated from #239) to see a proposal of how authorship attribution could look like. |
Thanks for the great comments @OriolAbril ! I think I've addressed everything; the splines example was super useful too! Please let me know of anything else. Edit: the notebook metadata and citations didn't render very well. I'm not sure if this is related to recent changes at #198 or something I've done, though the markup seems fine. |
examples/case_studies/GEV.ipynb
Outdated
@@ -0,0 +1,1049 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using
{cite:p}`Coles2001`
will add the author and year of the citation automatically and make it a clickable link to the bibliography section, there is no need for the manual citation. In this case however, I think you want cite:t
to generate a textual citation instead of a citation between parenthesis. There will be also no need for the manual bibliography at the end, it will be automatically generated when building the docs.
I would also suggest using generalized extreme value everywhere, especially if aiming for a beginner notebook.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @OriolAbril - I was missing the colon; the citation should render ok now, and I have the biblio referenced.
GEV is standard terminology in extremes - the acronym is introduced at the top; my preference is to keep it as is for this reason. For example, here is an extract from the main reference:
examples/case_studies/GEV.ipynb
Outdated
@@ -0,0 +1,1049 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Black can be turned off at will, this is one of such cases where turning it off is recommended: https://github.com/pymc-devs/pymc/wiki/PyMC-Jupyter-Notebook-Style-Guide#pre-commit-and-code-formatting
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got this - thanks!
examples/case_studies/GEV.ipynb
Outdated
@@ -0,0 +1,1049 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done; and the other refs too - thanks!
examples/case_studies/GEV.ipynb
Outdated
@@ -0,0 +1,1049 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use plot_posterior to generate the plot. I believe the code below should get a somewhat similar look
az.plot_posterior(prior_pc, group="prior", var_names="~gev", hdi_prob="hide", point_estimate=None)
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so much better! Thank you! Done.
examples/case_studies/GEV.ipynb
Outdated
@@ -0,0 +1,1049 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
examples/case_studies/GEV.ipynb
Outdated
@@ -0,0 +1,1049 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't what we calculate below the maximum a posteriori even if we do compare it to the maximum likelihood estimates?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point - text updated to be more technically correct. Can you check this please?
Ref: https://mmuratarat.github.io/2019-08-24/mle_map_estimation
examples/case_studies/GEV.ipynb
Outdated
@@ -0,0 +1,1049 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you consider opening an issue at ArviZ to better expose the mode calculation function?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super idea - I will!
I tried the ppc plot with priors but got the following error: /opt/anaconda3/lib/python3.8/site-packages/arviz/plots/ppcplot.py in plot_ppc(data, kind, alpha, mean, observed, color, colors, grid, figsize, textsize, data_pairs, var_names, filter_vars, coords, flatten, flatten_pp, num_pp_samples, random_seed, jitter, animated, animation_kwargs, legend, labeller, ax, backend, backend_kwargs, group, show) TypeError: |
@Dawar-812 This is probably an Arviz issue, but can you check that you have the group |
@ccaprani Here is the code: ` |
@Dawar-812 this PR is using the development version of pymc. To reproduce the results you need to install pymc from github which is still changing in preparation for the soon to be released 4.0.beta release and lacks some key features (as the warning on import shows also here on the notebook). Now, sampling methods by default return InferenceData objects, so you can take their output and use arviz functions straight away, for this to work in pymc 3.11.x you need to manually convert the output to inferencedata before calling plot_ppc |
examples/references.bib
Outdated
@@ -120,3 +120,56 @@ @inproceedings{salakhutdinov2008bayesian | |||
year={2008}, | |||
volume={25} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you sort the references? It doesn't matter much now, but it will be a nightmare to see which references we already have and which are not present yet when we reach the 70th notebook to update 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @OriolAbril , can you clarify what you mean please? All the references needed for this new workbook are now in the bibliography and render properly in the built version of the docs: https://pymc-examples--241.org.readthedocs.build/en/241/case_studies/GEV.html
. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bibliography in the website is completely automated now, this comment is about the references.bib file which (at least for now) is shared between all notebooks. And we expect to have many of the references cited in multiple notebooks so we need a way to for contributors to find which ids to use by searching on the bib file. Ref https://github.com/pymc-devs/pymc/wiki/PyMC-Jupyter-Notebook-Style-Guide#references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally understood the problem and have fixed it now, adding the keyword gev
to the bib entry keys - latest commit. Thanks for your patience!
examples/case_studies/GEV.ipynb
Outdated
@@ -0,0 +1,997 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be extra clear, prior_pc
and trace
are exactly the same type of object: InferenceData , the only difference is that they have different groups. I personally recommend adding all the groups to the same InferenceData so that you can take advantage of all plots and diagnostics and save everything in a single netcdf/zarr file.
You could do that with
idata = pm.sample_prior_predictive(...) az.plot_posterior(idata, group="prior") # plots prior distribution ... idata.extend(pm.sample(...)) az.plot_posterior(idata) # plots posterior which is the default group if present # but then you could also do things like az.plot_dist_comparison(idata)
in order to compare prior and posterior distributions: https://arviz-devs.github.io/arviz/api/generated/arviz.plot_dist_comparison.html
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @OriolAbril , it's definitely better to just have one idata
object, and the prior/predictive check is a great addition. I just didn't know how to do it, so thanks for the prompt!
FYI, It looks like the extend
method could be added to the relevant docs - took some digging!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the conversion docs needs a thorough update and improvement: arviz-devs/arviz#1489
Looks great and everything renders correctly now, thanks so much! The only reason I am not approving is to prevent someone too eager to merge before v4 is released 😄 |
The "Add PyMC3 classes used to tags" hook fails - I guess this should be removed or updated for v4? |
we are looking into it, we might replace it altogether with an alternative instead of updating, don't pay attention to it for now. We have also added a pre-commit check for the bibtex file to keep it tidy and sorted. Do you mind rebasing? I could also do it at some point next week if you prefer but if the notebook needs more changes you'll need to be careful pulling after my force push. |
We have now removed the pymc classes as tags and will use a different approach for them. You should remove the |
Can you remove all |
@OriolAbril @drbenvincent I'm surprisingly hopeful we can close this out at some point :-) I can't see there's anything?? Thanks. |
Will set the checks running, but from memory I think we don't need |
Just a few things:
Otherwise, looks good and can approve 👍🏻 |
Sorry, one more thing. Just noticed that there's a fail of the pre-commit checks relating to Not 100% sure why the test is failing, but if the branch is a bit old, you might need to rebase to incorporate more recent changes in this file. |
Thanks for that feedback and review @drbenvincent ! Changes made. We're investigating more on the reasons for divergences, but pushing the The pre-commit hooks pass locally. I think it might have needed the |
The remote pre-commit checks are failing. I suspect that the things may have changed since you created the branch to work on this. So you might need to rebase. I'm a bit trial and error with this, but could try
then re-run the pre-commit process and cross fingers. Sorry this is taking a while, I know how frustrating these pre-commit checks can be! |
Thanks for your patience @drbenvincent ! It had skipped the bibtex-tidy check the last time; but all good now - I think bibtex-tidy is sorting the refs by id; all sorted now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good. Checks pass. The rendered docs look fine. Thanks very much for the contribution!
Thanks for your help closing this out @drbenvincent 👍 |
This PR adds a notebook example to go alongside the addition of the Generalized Extreme Value distribution to pymc v4 opened in PR pymc-devs/pymc#5085