Added Generalized Extreme Value distribution example #241

ccaprani · 2021-10-18T07:57:20Z

This PR adds a notebook example to go alongside the addition of the Generalized Extreme Value distribution to pymc v4 opened in PR pymc-devs/pymc#5085

review-notebook-app · 2021-10-18T07:57:24Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ricardoV94 · 2021-10-18T12:09:36Z

examples/case_studies/GEV.ipynb

@@ -0,0 +1,899 @@
+{


We should say something about the divergences. They probably arise from the hard constraint between the support and/or parameters, which NUTS struggles with.

Since the support domain changes depending on whether xi is negative, zero or positive, there is not much we can do (at least nothing obvious). Ideally we would parametrize the distribution (or apply a transformation) in such way that any proposed value would be in the supported domain of the distribution. This would likely alleviate divergences...

Reply via ReviewNB

Thanks - this is done now, with some recommendations for the user.

OriolAbril · 2021-10-19T00:39:33Z

Thanks for your PR! I have added the hold for v4 tag to make sure it is merged after the beta release is out. This will ensure the notebook is reproducible from the watermark only (pymc version doesn't include commit info) as well as not having the "You are running the v4 development version of PyMC which currently still lacks key features." in our documentation examples.

Also make sure to follow the style guide: https://github.com/pymc-devs/pymc/wiki/PyMC-Jupyter-Notebook-Style-Guide. This will ensure the notebook is easily discoverable via tags, contains all the necessary info to rerun from the watermark and overall tries to set all the notebooks in the example gallery to a similar style so readers know what to expect.

I also noticed you have not listed yourself as author, which is also something we are currently discussing to make sure we clearly credit everyone who contributes to the example gallery. See #198 for example if you are interested in this or https://pymc-examples--239.org.readthedocs.build/en/239/splines/spline.html (generated from #239) to see a proposal of how authorship attribution could look like.

ccaprani · 2021-10-19T02:25:36Z

Thanks for the great comments @OriolAbril ! I think I've addressed everything; the splines example was super useful too!

Please let me know of anything else.

Edit: the notebook metadata and citations didn't render very well. I'm not sure if this is related to recent changes at #198 or something I've done, though the markup seems fine.

OriolAbril · 2021-10-19T02:56:42Z

examples/case_studies/GEV.ipynb

@@ -0,0 +1,1049 @@
+{


using

{cite:p}`Coles2001`

will add the author and year of the citation automatically and make it a clickable link to the bibliography section, there is no need for the manual citation. In this case however, I think you want cite:t to generate a textual citation instead of a citation between parenthesis. There will be also no need for the manual bibliography at the end, it will be automatically generated when building the docs.

I would also suggest using generalized extreme value everywhere, especially if aiming for a beginner notebook.

Reply via ReviewNB

Thanks @OriolAbril - I was missing the colon; the citation should render ok now, and I have the biblio referenced.

GEV is standard terminology in extremes - the acronym is introduced at the top; my preference is to keep it as is for this reason. For example, here is an extract from the main reference:

OriolAbril · 2021-10-19T02:56:42Z

examples/case_studies/GEV.ipynb

@@ -0,0 +1,1049 @@
+{


Black can be turned off at will, this is one of such cases where turning it off is recommended: https://github.com/pymc-devs/pymc/wiki/PyMC-Jupyter-Notebook-Style-Guide#pre-commit-and-code-formatting

Reply via ReviewNB

Got this - thanks!

OriolAbril · 2021-10-19T02:56:42Z

examples/case_studies/GEV.ipynb

@@ -0,0 +1,1049 @@
+{


can you use bibtex too for caprani and obrien citation?

Reply via ReviewNB

Done; and the other refs too - thanks!

OriolAbril · 2021-10-19T02:56:43Z

examples/case_studies/GEV.ipynb

@@ -0,0 +1,1049 @@
+{


you can use plot_posterior to generate the plot. I believe the code below should get a somewhat similar look

az.plot_posterior(prior_pc, group="prior", var_names="~gev", hdi_prob="hide", point_estimate=None)

Reply via ReviewNB

This is so much better! Thank you! Done.

examples/case_studies/GEV.ipynb

OriolAbril · 2021-10-19T02:56:43Z

examples/case_studies/GEV.ipynb

@@ -0,0 +1,1049 @@
+{


Use credible interval, not ci

Reply via ReviewNB

OriolAbril · 2021-10-19T02:56:43Z

examples/case_studies/GEV.ipynb

@@ -0,0 +1,1049 @@
+{


Isn't what we calculate below the maximum a posteriori even if we do compare it to the maximum likelihood estimates?

Reply via ReviewNB

Excellent point - text updated to be more technically correct. Can you check this please?

Ref: https://mmuratarat.github.io/2019-08-24/mle_map_estimation

OriolAbril · 2021-10-19T02:56:43Z

examples/case_studies/GEV.ipynb

@@ -0,0 +1,1049 @@
+{


would you consider opening an issue at ArviZ to better expose the mode calculation function?

Reply via ReviewNB

Super idea - I will!

Dawar-812 · 2021-10-19T14:13:27Z

I tried the ppc plot with priors but got the following error:
`TypeError Traceback (most recent call last)
in
----> 1 az.plot_ppc(data_pc, group="prior", figsize=(12, 6))
2 ax = plt.gca()
3 ax.set_xlim([2, 6])
4 ax.set_ylim([0, 2]);

/opt/anaconda3/lib/python3.8/site-packages/arviz/plots/ppcplot.py in plot_ppc(data, kind, alpha, mean, observed, color, colors, grid, figsize, textsize, data_pairs, var_names, filter_vars, coords, flatten, flatten_pp, num_pp_samples, random_seed, jitter, animated, animation_kwargs, legend, labeller, ax, backend, backend_kwargs, group, show)
208 for groups in (f"{group}_predictive", "observed_data"):
209 if not hasattr(data, groups):
--> 210 raise TypeError(f'data argument must have the group "{groups}" for ppcplot')
211
212 if kind.lower() not in ("kde", "cumulative", "scatter"):

TypeError: data argument must have the group "prior_predictive" for ppcplot`

ccaprani · 2021-10-19T14:34:44Z

@Dawar-812 This is probably an Arviz issue, but can you check that you have the group 'prior' in your InferenceData prior_pc object? And that you're using Arviz v. 0.11.4?

Dawar-812 · 2021-10-19T14:52:15Z

@ccaprani Here is the code:
`data_pc = pm.sample_prior_predictive(samples=1000, model=model_gev)
az.plot_ppc(data_pc, group="prior", figsize=(12, 6))
ax = plt.gca()
ax.set_xlim([2, 6])
ax.set_ylim([0, 2]);

`
Yes, I am using Arviz v. 0.11.4

OriolAbril · 2021-10-19T17:48:24Z

@Dawar-812 this PR is using the development version of pymc. To reproduce the results you need to install pymc from github which is still changing in preparation for the soon to be released 4.0.beta release and lacks some key features (as the warning on import shows also here on the notebook). Now, sampling methods by default return InferenceData objects, so you can take their output and use arviz functions straight away, for this to work in pymc 3.11.x you need to manually convert the output to inferencedata before calling plot_ppc

OriolAbril · 2021-10-22T21:23:42Z

examples/references.bib

@@ -120,3 +120,56 @@ @inproceedings{salakhutdinov2008bayesian
  year={2008},
  volume={25}
 }
+


Can you sort the references? It doesn't matter much now, but it will be a nightmare to see which references we already have and which are not present yet when we reach the 70th notebook to update 😅

Hi @OriolAbril , can you clarify what you mean please? All the references needed for this new workbook are now in the bibliography and render properly in the built version of the docs: https://pymc-examples--241.org.readthedocs.build/en/241/case_studies/GEV.html. Thanks.

The bibliography in the website is completely automated now, this comment is about the references.bib file which (at least for now) is shared between all notebooks. And we expect to have many of the references cited in multiple notebooks so we need a way to for contributors to find which ids to use by searching on the bib file. Ref https://github.com/pymc-devs/pymc/wiki/PyMC-Jupyter-Notebook-Style-Guide#references

I finally understood the problem and have fixed it now, adding the keyword gev to the bib entry keys - latest commit. Thanks for your patience!

OriolAbril · 2021-10-22T21:50:12Z

examples/case_studies/GEV.ipynb

@@ -0,0 +1,997 @@
+{


Just to be extra clear, prior_pc and trace are exactly the same type of object: InferenceData , the only difference is that they have different groups. I personally recommend adding all the groups to the same InferenceData so that you can take advantage of all plots and diagnostics and save everything in a single netcdf/zarr file.

You could do that with

idata = pm.sample_prior_predictive(...) az.plot_posterior(idata, group="prior") # plots prior distribution ... idata.extend(pm.sample(...)) az.plot_posterior(idata) # plots posterior which is the default group if present # but then you could also do things like az.plot_dist_comparison(idata)

in order to compare prior and posterior distributions: https://arviz-devs.github.io/arviz/api/generated/arviz.plot_dist_comparison.html

Reply via ReviewNB

Thanks again @OriolAbril , it's definitely better to just have one idata object, and the prior/predictive check is a great addition. I just didn't know how to do it, so thanks for the prompt!

FYI, It looks like the extend method could be added to the relevant docs - took some digging!

Yes, the conversion docs needs a thorough update and improvement: arviz-devs/arviz#1489

OriolAbril · 2021-10-22T21:51:54Z

Looks great and everything renders correctly now, thanks so much!

The only reason I am not approving is to prevent someone too eager to merge before v4 is released 😄

ccaprani · 2022-02-04T11:12:38Z

The "Add PyMC3 classes used to tags" hook fails - I guess this should be removed or updated for v4?

OriolAbril · 2022-02-05T14:59:12Z

The "Add PyMC3 classes used to tags" hook fails - I guess this should be removed or updated for v4?

we are looking into it, we might replace it altogether with an alternative instead of updating, don't pay attention to it for now. We have also added a pre-commit check for the bibtex file to keep it tidy and sorted. Do you mind rebasing? I could also do it at some point next week if you prefer but if the notebook needs more changes you'll need to be careful pulling after my force push.

OriolAbril · 2022-03-10T14:04:00Z

We have now removed the pymc classes as tags and will use a different approach for them. You should remove the pymc. tags and CI will pass again. You should also remove the table_of_contents files that are no longer being used and have been removed.

drbenvincent · 2022-06-10T13:57:38Z

Can you remove all pymc.* or pymc3.* tags in the first code cell?

ccaprani · 2022-09-26T14:34:12Z

@OriolAbril @drbenvincent I'm surprisingly hopeful we can close this out at some point :-) I can't see there's anything?? Thanks.

drbenvincent · 2022-09-26T14:51:21Z

Will set the checks running, but from memory I think we don't need examples/table_of_contents_examples.js any more. Taking a look now.

drbenvincent · 2022-09-26T15:00:22Z

Just a few things:

Delete the file examples/table_of_contents_examples.js. It's not needed any more.
Should "Authored by Colin Caprani, October 2021" be updated in terms of the date?
Have you tried cranking up the target_accept even higher?

Otherwise, looks good and can approve 👍🏻

drbenvincent · 2022-09-26T15:05:04Z

Sorry, one more thing. Just noticed that there's a fail of the pre-commit checks relating to references.bib. You might need to re-run the pre-commit checks twice, they can be a bit iffy.

Not 100% sure why the test is failing, but if the branch is a bit old, you might need to rebase to incorporate more recent changes in this file.

ccaprani · 2022-09-27T09:43:14Z

Thanks for that feedback and review @drbenvincent !

Changes made. We're investigating more on the reasons for divergences, but pushing the target-accept higher does improve it. We don't observe this in our research problems, but for this data set it appears. C'est la vie!

The pre-commit hooks pass locally. I think it might have needed the myst.md file added to the commit?

drbenvincent · 2022-09-27T12:54:10Z

The remote pre-commit checks are failing. I suspect that the things may have changed since you created the branch to work on this. So you might need to rebase.

I'm a bit trial and error with this, but could try

git remote add upstream https://github.com/pymc-devs/pymc-examples.git
git fetch upstream main
git rebase upstream/main

then re-run the pre-commit process and cross fingers.

Sorry this is taking a while, I know how frustrating these pre-commit checks can be!

ccaprani · 2022-09-27T14:24:19Z

Thanks for your patience @drbenvincent ! It had skipped the bibtex-tidy check the last time; but all good now - I think bibtex-tidy is sorting the refs by id; all sorted now!

drbenvincent

Everything looks good. Checks pass. The rendered docs look fine. Thanks very much for the contribution!

ccaprani · 2022-09-27T15:32:24Z

Thanks for your help closing this out @drbenvincent 👍

ccaprani mentioned this pull request Oct 18, 2021

Added Generalized Extreme Value distribution pymc-devs/pymc#5085

Closed

ricardoV94 reviewed Oct 18, 2021

View reviewed changes

OriolAbril added the hold_for_v4 label Oct 19, 2021

OriolAbril reviewed Oct 19, 2021

View reviewed changes

ccaprani mentioned this pull request Oct 19, 2021

Better API for obtaining posterior point estimates & more arviz-devs/arviz#1899

Open

OriolAbril reviewed Oct 22, 2021

View reviewed changes

ccaprani mentioned this pull request Feb 2, 2022

Adding Generalized Extreme Value Distribution pymc-devs/pymc-extras#4

Closed

OriolAbril removed the hold_for_v4 label Mar 20, 2022

ccaprani closed this Sep 26, 2022

ccaprani force-pushed the gev branch from fff9794 to 1656c38 Compare September 26, 2022 12:47

ccaprani added 7 commits September 27, 2022 00:18

Added GEV distribution example

59f5797

Added commentary on divergences

d833636

Updated for style and tags

ecdb997

Bibtex commas

0248e31

Remove black from data formatting

de30cda

Updates following detailed review

204ff96

Typos

8e4c012

ccaprani added 5 commits September 27, 2022 00:19

ref tweak

e6621d3

Minor improvements

5e835e3

Updated idata & plot_dist_comparison following review

383c19a

bib file conformance

82abb55

Updated for latest changes to repo

bd9946d

ccaprani reopened this Sep 26, 2022

Removed pymc tags in first cell

0eab77f

drbenvincent assigned drbenvincent and unassigned drbenvincent Sep 26, 2022

drbenvincent self-requested a review September 26, 2022 15:09

Updated nb per comments

2a3032c

Corrected bib file

2e37983

drbenvincent approved these changes Sep 27, 2022

View reviewed changes

drbenvincent merged commit 82c0e4e into pymc-devs:main Sep 27, 2022

ccaprani mentioned this pull request Sep 29, 2022

Genextreme pymc-devs/pymc-extras#84

Merged

Uh oh!

Added Generalized Extreme Value distribution example #241

Added Generalized Extreme Value distribution example #241

Uh oh!

Conversation

ccaprani commented Oct 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Oct 18, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OriolAbril commented Oct 19, 2021

Uh oh!

ccaprani commented Oct 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OriolAbril Oct 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dawar-812 commented Oct 19, 2021

Uh oh!

ccaprani commented Oct 19, 2021

Uh oh!

Dawar-812 commented Oct 19, 2021

Uh oh!

OriolAbril commented Oct 19, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OriolAbril commented Oct 22, 2021

Uh oh!

ccaprani commented Feb 4, 2022

Uh oh!

OriolAbril commented Feb 5, 2022

Uh oh!

OriolAbril commented Mar 10, 2022

Uh oh!

ccaprani commented Oct 18, 2021 •

edited

Loading

ccaprani commented Oct 19, 2021 •

edited

Loading

OriolAbril Oct 19, 2021 •

edited

Loading