Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Generalized Extreme Value distribution example #241

Merged
merged 15 commits into from
Sep 27, 2022

Conversation

ccaprani
Copy link
Contributor

@ccaprani ccaprani commented Oct 18, 2021

This PR adds a notebook example to go alongside the addition of the Generalized Extreme Value distribution to pymc v4 opened in PR pymc-devs/pymc#5085

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -0,0 +1,899 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should say something about the divergences. They probably arise from the hard constraint between the support and/or parameters, which NUTS struggles with.

Since the support domain changes depending on whether xi is negative, zero or positive, there is not much we can do (at least nothing obvious). Ideally we would parametrize the distribution (or apply a transformation) in such way that any proposed value would be in the supported domain of the distribution. This would likely alleviate divergences...


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this is done now, with some recommendations for the user.

@OriolAbril
Copy link
Member

Thanks for your PR! I have added the hold for v4 tag to make sure it is merged after the beta release is out. This will ensure the notebook is reproducible from the watermark only (pymc version doesn't include commit info) as well as not having the "You are running the v4 development version of PyMC which currently still lacks key features." in our documentation examples.

Also make sure to follow the style guide: https://github.com/pymc-devs/pymc/wiki/PyMC-Jupyter-Notebook-Style-Guide. This will ensure the notebook is easily discoverable via tags, contains all the necessary info to rerun from the watermark and overall tries to set all the notebooks in the example gallery to a similar style so readers know what to expect.

I also noticed you have not listed yourself as author, which is also something we are currently discussing to make sure we clearly credit everyone who contributes to the example gallery. See #198 for example if you are interested in this or https://pymc-examples--239.org.readthedocs.build/en/239/splines/spline.html (generated from #239) to see a proposal of how authorship attribution could look like.

@ccaprani
Copy link
Contributor Author

ccaprani commented Oct 19, 2021

Thanks for the great comments @OriolAbril ! I think I've addressed everything; the splines example was super useful too!

Please let me know of anything else.


Edit: the notebook metadata and citations didn't render very well. I'm not sure if this is related to recent changes at #198 or something I've done, though the markup seems fine.

@@ -0,0 +1,1049 @@
{
Copy link
Member

@OriolAbril OriolAbril Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using

{cite:p}`Coles2001`

will add the author and year of the citation automatically and make it a clickable link to the bibliography section, there is no need for the manual citation. In this case however, I think you want cite:t to generate a textual citation instead of a citation between parenthesis. There will be also no need for the manual bibliography at the end, it will be automatically generated when building the docs.

I would also suggest using generalized extreme value everywhere, especially if aiming for a beginner notebook.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @OriolAbril - I was missing the colon; the citation should render ok now, and I have the biblio referenced.

GEV is standard terminology in extremes - the acronym is introduced at the top; my preference is to keep it as is for this reason. For example, here is an extract from the main reference:

image

@@ -0,0 +1,1049 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black can be turned off at will, this is one of such cases where turning it off is recommended: https://github.com/pymc-devs/pymc/wiki/PyMC-Jupyter-Notebook-Style-Guide#pre-commit-and-code-formatting


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got this - thanks!

@@ -0,0 +1,1049 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use bibtex too for caprani and obrien citation?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done; and the other refs too - thanks!

@@ -0,0 +1,1049 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use plot_posterior to generate the plot. I believe the code below should get a somewhat similar look

az.plot_posterior(prior_pc, group="prior", var_names="~gev", hdi_prob="hide", point_estimate=None)

Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so much better! Thank you! Done.

examples/case_studies/GEV.ipynb Outdated Show resolved Hide resolved
@@ -0,0 +1,1049 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use credible interval, not ci


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,1049 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't what we calculate below the maximum a posteriori even if we do compare it to the maximum likelihood estimates?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point - text updated to be more technically correct. Can you check this please?

Ref: https://mmuratarat.github.io/2019-08-24/mle_map_estimation

@@ -0,0 +1,1049 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you consider opening an issue at ArviZ to better expose the mode calculation function?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super idea - I will!

@Dawar-812
Copy link

I tried the ppc plot with priors but got the following error:
`TypeError Traceback (most recent call last)
in
----> 1 az.plot_ppc(data_pc, group="prior", figsize=(12, 6))
2 ax = plt.gca()
3 ax.set_xlim([2, 6])
4 ax.set_ylim([0, 2]);

/opt/anaconda3/lib/python3.8/site-packages/arviz/plots/ppcplot.py in plot_ppc(data, kind, alpha, mean, observed, color, colors, grid, figsize, textsize, data_pairs, var_names, filter_vars, coords, flatten, flatten_pp, num_pp_samples, random_seed, jitter, animated, animation_kwargs, legend, labeller, ax, backend, backend_kwargs, group, show)
208 for groups in (f"{group}_predictive", "observed_data"):
209 if not hasattr(data, groups):
--> 210 raise TypeError(f'data argument must have the group "{groups}" for ppcplot')
211
212 if kind.lower() not in ("kde", "cumulative", "scatter"):

TypeError: data argument must have the group "prior_predictive" for ppcplot`

@ccaprani
Copy link
Contributor Author

@Dawar-812 This is probably an Arviz issue, but can you check that you have the group 'prior' in your InferenceData prior_pc object? And that you're using Arviz v. 0.11.4?

@Dawar-812
Copy link

@ccaprani Here is the code:
`data_pc = pm.sample_prior_predictive(samples=1000, model=model_gev)
az.plot_ppc(data_pc, group="prior", figsize=(12, 6))
ax = plt.gca()
ax.set_xlim([2, 6])
ax.set_ylim([0, 2]);

`
Yes, I am using Arviz v. 0.11.4

@OriolAbril
Copy link
Member

@Dawar-812 this PR is using the development version of pymc. To reproduce the results you need to install pymc from github which is still changing in preparation for the soon to be released 4.0.beta release and lacks some key features (as the warning on import shows also here on the notebook). Now, sampling methods by default return InferenceData objects, so you can take their output and use arviz functions straight away, for this to work in pymc 3.11.x you need to manually convert the output to inferencedata before calling plot_ppc

@@ -120,3 +120,56 @@ @inproceedings{salakhutdinov2008bayesian
year={2008},
volume={25}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you sort the references? It doesn't matter much now, but it will be a nightmare to see which references we already have and which are not present yet when we reach the 70th notebook to update 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @OriolAbril , can you clarify what you mean please? All the references needed for this new workbook are now in the bibliography and render properly in the built version of the docs: https://pymc-examples--241.org.readthedocs.build/en/241/case_studies/GEV.html. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bibliography in the website is completely automated now, this comment is about the references.bib file which (at least for now) is shared between all notebooks. And we expect to have many of the references cited in multiple notebooks so we need a way to for contributors to find which ids to use by searching on the bib file. Ref https://github.com/pymc-devs/pymc/wiki/PyMC-Jupyter-Notebook-Style-Guide#references

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally understood the problem and have fixed it now, adding the keyword gev to the bib entry keys - latest commit. Thanks for your patience!

@@ -0,0 +1,997 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be extra clear, prior_pc and trace are exactly the same type of object: InferenceData , the only difference is that they have different groups. I personally recommend adding all the groups to the same InferenceData so that you can take advantage of all plots and diagnostics and save everything in a single netcdf/zarr file.

You could do that with

idata = pm.sample_prior_predictive(...)
az.plot_posterior(idata, group="prior") # plots prior distribution
...
idata.extend(pm.sample(...)) 
az.plot_posterior(idata)  # plots posterior which is the default group if present
# but then you could also do things like
az.plot_dist_comparison(idata)

in order to compare prior and posterior distributions: https://arviz-devs.github.io/arviz/api/generated/arviz.plot_dist_comparison.html


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @OriolAbril , it's definitely better to just have one idata object, and the prior/predictive check is a great addition. I just didn't know how to do it, so thanks for the prompt!

FYI, It looks like the extend method could be added to the relevant docs - took some digging!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the conversion docs needs a thorough update and improvement: arviz-devs/arviz#1489

@OriolAbril
Copy link
Member

Looks great and everything renders correctly now, thanks so much!

The only reason I am not approving is to prevent someone too eager to merge before v4 is released 😄

@ccaprani
Copy link
Contributor Author

ccaprani commented Feb 4, 2022

The "Add PyMC3 classes used to tags" hook fails - I guess this should be removed or updated for v4?

@OriolAbril
Copy link
Member

The "Add PyMC3 classes used to tags" hook fails - I guess this should be removed or updated for v4?

we are looking into it, we might replace it altogether with an alternative instead of updating, don't pay attention to it for now. We have also added a pre-commit check for the bibtex file to keep it tidy and sorted. Do you mind rebasing? I could also do it at some point next week if you prefer but if the notebook needs more changes you'll need to be careful pulling after my force push.

@OriolAbril
Copy link
Member

We have now removed the pymc classes as tags and will use a different approach for them. You should remove the pymc. tags and CI will pass again. You should also remove the table_of_contents files that are no longer being used and have been removed.

@drbenvincent
Copy link
Contributor

Can you remove all pymc.* or pymc3.* tags in the first code cell?

@ccaprani ccaprani reopened this Sep 26, 2022
@ccaprani
Copy link
Contributor Author

@OriolAbril @drbenvincent I'm surprisingly hopeful we can close this out at some point :-) I can't see there's anything?? Thanks.

@drbenvincent
Copy link
Contributor

Will set the checks running, but from memory I think we don't need examples/table_of_contents_examples.js any more. Taking a look now.

@drbenvincent
Copy link
Contributor

Just a few things:

  • Delete the file examples/table_of_contents_examples.js. It's not needed any more.
  • Should "Authored by Colin Caprani, October 2021" be updated in terms of the date?
  • Have you tried cranking up the target_accept even higher?

Otherwise, looks good and can approve 👍🏻

@drbenvincent
Copy link
Contributor

Sorry, one more thing. Just noticed that there's a fail of the pre-commit checks relating to references.bib. You might need to re-run the pre-commit checks twice, they can be a bit iffy.

Not 100% sure why the test is failing, but if the branch is a bit old, you might need to rebase to incorporate more recent changes in this file.

@ccaprani
Copy link
Contributor Author

Thanks for that feedback and review @drbenvincent !

Changes made. We're investigating more on the reasons for divergences, but pushing the target-accept higher does improve it. We don't observe this in our research problems, but for this data set it appears. C'est la vie!

The pre-commit hooks pass locally. I think it might have needed the myst.md file added to the commit?

@drbenvincent
Copy link
Contributor

The remote pre-commit checks are failing. I suspect that the things may have changed since you created the branch to work on this. So you might need to rebase.

I'm a bit trial and error with this, but could try

git remote add upstream https://github.com/pymc-devs/pymc-examples.git
git fetch upstream main
git rebase upstream/main

then re-run the pre-commit process and cross fingers.

Sorry this is taking a while, I know how frustrating these pre-commit checks can be!

@ccaprani
Copy link
Contributor Author

Thanks for your patience @drbenvincent ! It had skipped the bibtex-tidy check the last time; but all good now - I think bibtex-tidy is sorting the refs by id; all sorted now!

Copy link
Contributor

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good. Checks pass. The rendered docs look fine. Thanks very much for the contribution!

@drbenvincent drbenvincent merged commit 82c0e4e into pymc-devs:main Sep 27, 2022
@ccaprani
Copy link
Contributor Author

Thanks for your help closing this out @drbenvincent 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants