Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Generalized Extreme Value distribution #5085

Closed
wants to merge 7 commits into from

Conversation

ccaprani
Copy link

This PR adds the GEV distribution to Pymc v4. An example notebook is also submitted as PR to pymc-examples: pymc-devs/pymc-examples#241

Docstrings, an example, and tests are added; code is formatted with black.

Considering the complexity of the bounds for this distribution, a careful review of the tensor algebra used in the logpdf and log_cdf functions would be very welcome.

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening the PR. I left a couple of comments below. The logp/logcd expression seems reasonable, if xi==0 is a valid parameter value (and not just a limiting case, such as with sigma).

You seem to be missing tests for the RandomVariable, which should go into test_distributions_random.py. Here is a guide that explains how to write these tests and a bit more: https://github.com/pymc-devs/pymc/blob/main/docs/source/developer_guide_implementing_distribution.md

You can also consider adding a moment to be used as a stable initialization point. This has not been documented in that guide yet, but you can find some examples for the normal distribution here:

def get_moment(value_var, size, mu, sigma):
return at.full(size, mu, dtype=aesara.config.floatX)

pymc/tests/test_distributions.py Outdated Show resolved Hide resolved
pymc/tests/test_distributions.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Oct 18, 2021

Codecov Report

Merging #5085 (a9b2bb8) into main (598dd9d) will decrease coverage by 3.00%.
The diff coverage is 72.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5085      +/-   ##
==========================================
- Coverage   75.10%   72.09%   -3.01%     
==========================================
  Files          87       87              
  Lines       14142    14175      +33     
==========================================
- Hits        10621    10220     -401     
- Misses       3521     3955     +434     
Impacted Files Coverage Δ
pymc/distributions/__init__.py 100.00% <ø> (ø)
pymc/distributions/continuous.py 83.63% <72.72%> (-12.20%) ⬇️
pymc/printing.py 22.22% <0.00%> (-63.64%) ⬇️
pymc/distributions/bound.py 40.00% <0.00%> (-60.00%) ⬇️
pymc/distributions/discrete.py 77.55% <0.00%> (-20.47%) ⬇️
pymc/distributions/multivariate.py 54.46% <0.00%> (-16.97%) ⬇️
pymc/distributions/dist_math.py 81.28% <0.00%> (-6.96%) ⬇️
pymc/initial_point.py 95.53% <0.00%> (-4.47%) ⬇️
pymc/distributions/distribution.py 91.71% <0.00%> (-2.77%) ⬇️
... and 4 more

@ccaprani
Copy link
Author

ccaprani commented Oct 18, 2021

These are such great comments @ricardoV94 - thank you very much indeed.

The parameter xi should indeed be tested at xi = 0 (it's not a bound), so I've added that as well. It should return the same as the Gumbel distribution in that case.

The guide was super helpful, and I should not have missed the bit about adding the tests to test_distributions_random.py - thanks again for pointing this out, and my other misunderstandings.

I've made the changes and in running the new improved tests I am having trouble with the at.switch in logp - it does not seem to be properly recognizing at.isclose(xi,0) - is this something I'm doing, or a known problem? The docs are pretty clear.

@ricardoV94
Copy link
Member

ricardoV94 commented Oct 18, 2021

I've made the changes and in running the new improved tests I am having trouble with the at.switch in logp - it does not seem to be properly recognizing at.isclose(xi,0) - is this something I'm doing, or a known problem? The docs are pretty clear.

What problem are you seeing? This works for me:

import numpy as np
import aesara.tensor as at
xi = at.vector('xi')
out = at.isclose(0, xi)
out.eval({xi: np.array([-1e-5, -1e-9, 0, 1e-9, 1e-5])})
# array([False,  True,  True,  True, False])

@ccaprani
Copy link
Author

Thanks again @ricardoV94 - I found a missing minus sign that has helped things along nicely!

This is a numerically sensitive beast. Here's a short code for my own testing, and the aesara and scipy warnings for certain parameter combinations...I don't think there is much to be done about it though? I could play around with the rtol and atol for at.isclose, but I'm not sure we want to be make decisions on behalf of users?

import pymc as pm
import aesara.tensor as at
from pymc.aesaraf import floatX
import scipy.stats.distributions as sp
import numpy as np
from itertools import product

μ_l = [-1, 0, 1]
σ_l = [0.1, 1, 10]
ξ_l = [-0.9, -0.5, 0, 0.5, 0.9]
x = np.linspace(-2,5)

for μ,σ,ξ in product(μ_l, σ_l, ξ_l):
    print(f"Running: {μ=}, {σ=}, {ξ=}")
    mu = at.as_tensor_variable(floatX(μ))
    sigma = at.as_tensor_variable(floatX(σ))
    xi = at.as_tensor_variable(floatX(ξ))
    logp_pm = pm.GenExtreme.logp(x,mu,sigma,xi).eval()
    logp_sp = sp.genextreme.logpdf(x, c=-ξ, loc=μ, scale=σ)
    if not np.allclose(logp_pm,logp_sp):
        print(f"Fail on {μ=}, {σ=}, {ξ=}")

gives these warnings at certain parameter combination:

Running: μ=-1, σ=0.1, ξ=0.5

/home/ccaprani/anaconda3/envs/pymc4/lib/python3.9/site-packages/aesara/tensor/math_opt.py:1054: RuntimeWarning: divide by zero encountered in true_divide
  v = reduce(np.multiply, num, one) / reduce(np.multiply, denum, one)
/home/ccaprani/anaconda3/envs/pymc4/lib/python3.9/site-packages/aesara/tensor/math_opt.py:1054: RuntimeWarning: divide by zero encountered in double_scalars
  v = reduce(np.multiply, num, one) / reduce(np.multiply, denum, one)

Running: μ=-1, σ=1, ξ=-0.5

/home/ccaprani/anaconda3/envs/pymc4/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:2492: RuntimeWarning: invalid value encountered in subtract
  -pex2+logpex2-logex2)
/home/ccaprani/anaconda3/envs/pymc4/lib/python3.9/site-packages/aesara/tensor/math_opt.py:1054: RuntimeWarning: divide by zero encountered in true_divide
  v = reduce(np.multiply, num, one) / reduce(np.multiply, denum, one)
/home/ccaprani/anaconda3/envs/pymc4/lib/python3.9/site-packages/aesara/tensor/math_opt.py:1054: RuntimeWarning: divide by zero encountered in double_scalars
  v = reduce(np.multiply, num, one) / reduce(np.multiply, denum, one)


Running: μ=0, σ=1, ξ=0.5

/home/ccaprani/anaconda3/envs/pymc4/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:2492: RuntimeWarning: invalid value encountered in add
  -pex2+logpex2-logex2)

Running: μ=0, σ=1, ξ=0.9

@ricardoV94
Copy link
Member

ricardoV94 commented Oct 18, 2021

We can lower the threshold of the allclose test. We do this for many distributions with the select_by_precision that you might have seen in test_distributions.py. Otherwise can you choose parameters that do not lead to very extreme logp and see if that helps?

By the way, are those failing allclose due to -inf/inf/nan?

Edit: regarding the rtol/atol in the logp/logcdf. If the distribution is indeed sensitive to those we can provide them as optional parameters to the users

@ccaprani
Copy link
Author

I've done quite a bit more exploration on this, and it's not really a problem with the allclose, but with evaluations of at.log1p(0) - the same happens in Scipy, as you can see from some output below.

This is actually tricky to pin down - sometimes the warnings are emitted, other times not. Nevertheless, the bound function kicks in and returns sensible results at the limits -inf. So, even though scipy issues warnings for the same problem, should we attempt to avoid these warnings by only computing when inside the support? In other words, a much more nested logp function?

I think at this point, for performance, I'd rather leave as is, and see if these edge case warnings cause issues in the wild or not.

Running: μ=-1, σ=0.1, ξ=0.5

/home/ccaprani/anaconda3/envs/pymc4/lib/python3.9/site-packages/aesara/tensor/math_opt.py:1054: RuntimeWarning: divide by zero encountered in true_divide
  v = reduce(np.multiply, num, one) / reduce(np.multiply, denum, one)
/home/ccaprani/anaconda3/envs/pymc4/lib/python3.9/site-packages/aesara/tensor/math_opt.py:1054: RuntimeWarning: divide by zero encountered in float_scalars
  v = reduce(np.multiply, num, one) / reduce(np.multiply, denum, one)

Running: μ=-1, σ=1, ξ=-0.5

/home/ccaprani/anaconda3/envs/pymc4/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:2492: RuntimeWarning: invalid value encountered in subtract
  -pex2+logpex2-logex2)
/home/ccaprani/anaconda3/envs/pymc4/lib/python3.9/site-packages/aesara/tensor/math_opt.py:1054: RuntimeWarning: divide by zero encountered in true_divide
  v = reduce(np.multiply, num, one) / reduce(np.multiply, denum, one)
/home/ccaprani/anaconda3/envs/pymc4/lib/python3.9/site-packages/aesara/tensor/math_opt.py:1054: RuntimeWarning: divide by zero encountered in float_scalars
  v = reduce(np.multiply, num, one) / reduce(np.multiply, denum, one)

@twiecki
Copy link
Member

twiecki commented Nov 3, 2021

@ricardoV94 what should we do here?

@ricardoV94
Copy link
Member

ricardoV94 commented Nov 3, 2021

@ricardoV94 what should we do here?

From pymc-devs/pymc-examples#241, it seems this distribution leads to many sampling issues, arising from the fact that the support domain changes from [-inf, 0], to [-inf, inf] to [0, inf] given the parameters.

I am not sure whether a custom transform that dynamically converts between lower-bounded, unbounded and upper bounded with a switch logic could fix some of these issues (and whether it is valid in the context of MCMC sampling).

Without an appropriate transformation this distribution seems to be of little practical use, and we might end up with a situation like one of the Cholesky distributions where we have a warning not to use it...

Perhaps we should include it only as a pymc-example?

@twiecki
Copy link
Member

twiecki commented Nov 3, 2021 via email

@ccaprani
Copy link
Author

ccaprani commented Nov 3, 2021

@ricardoV94 @twiecki there are a number of niche distributions already including in pymc, and in the modelling of extremes, this distribution is the only way to determine the extremal tail behaviour directly from the data, i.e. whether Gumbel, Weibull, or Frechet. Without the GEV nested model, there is no Bayesian way to compare between them (in the frequentist setting, people compare the sum of log-likelihoods of fits). While it probably does have a limited user-base, as you can see from Discourse, a number of users have implemented various versions of it in the past, and a benchmark pymc implementation is needed. As for the cases where warnings are emitted; as above, scipy does this too, and so users will be expecting it. If including a warning is required, then that's much better than not having it at all. Thanks!

@ricardoV94
Copy link
Member

@ricardoV94 @twiecki there are a number of niche distributions already including in pymc, and in the modelling of extremes, this distribution is the only way to determine the extremal tail behaviour directly from the data, i.e. whether Gumbel, Weibull, or Frechet. Without the GEV nested model, there is no Bayesian way to compare between them (in the frequentist setting, people compare the sum of log-likelihoods of fits).

There are probably other solutions to address this in a Bayesian conext, including (marginalized?) mixture models and/or model comparison like leave-one-out

While it probably does have a limited user-base, as you can see from Discourse, a number of users have implemented various versions of it in the past, and a benchmark pymc implementation is needed. As for the cases where warnings are emitted; as above, scipy does this too, and so users will be expecting it. If including a warning is required, then that's much better than not having it at all. Thanks!

It's important to note the different needs and goals of PyMC and SciPy. SciPy does not need to worry about sampling properties of the distribution since it's job is merely to provide random samples, and point evaluations. In contrast, PyMC tends to be concerned with best-practices in the context of Bayesian modelling and MCMC sampling, and sometimes in this context not having a feature may be better than having it at all.

As I suggested, we could and should have a notebook that we can point users to, which explains how to implement such distribution and warns about the limitations, since this seems to be a popular model overall.

It is also important to note the effort of adding a distribution to the codebase can be very small compared to the effort of maintaining such distribution in the long term + address related users issues.


I am not saying we cannot add this distribution of course. I just expressed my opinion and gave suggestion of work that may be necessary to make this distribution more useful / appropriate within the PyMC framework (i.e., explore whether a specialized transformation can alleviate some of the problems observed).

It goes without saying that I really appreciate time and effort taken here @ccaprani. We should perhaps hear from other developers/ users if you still believe this is a worthwhile addition as is.

@ccaprani
Copy link
Author

ccaprani commented Nov 3, 2021

It is also important to note the effort of adding a distribution to the codebase can be very small compared to the effort of maintaining such distribution in the long term + address related users issues.

I really do understand that, and while I am active on discourse, especially on this distribution, I know that relying on one person is not ideal. I think other maintainers will emerge.

gave suggestion of work that may be necessary to make this distribution more useful / appropriate within the PyMC framework (i.e., explore whether a specialized transformation can alleviate some of the problems observed).

I'm actually hopeful that bringing this important distribution (in extremes at least) to the wider bayesian community might open up some opportunities like this. I noted in the notebook about a Box-Cox transformation attempt, but this isn't good numerically.

As I suggested, we could and should have a notebook that we can point users to, which explains how to implement such distribution and warns about the limitations, since this seems to be a popular model overall.

This would certainly be a good second-best, and at least a 'meeting point' on bayesian GEV inference.

It goes without saying that I really appreciate time and effort taken here @ccaprani. We should perhaps hear from other developers/ users if you still believe this is a worthwhile addition as is.

Sounds good; and as ever all of the pymc devs' time is massively appreciated. This contribution - whether notebook only or not - is only a tiny bit in comparison...thank you!

@ccaprani
Copy link
Author

@ricardoV94 I've been looking around the literature a bit more on the divergences problem for this distribution. Most of the work uses just MH and so the sampler is not yelling at them as loudly as HMC, for which there are few works. This one though limits the shape parameter away from heavy tails to deal with the problem, and also uses a log scaling on the scale parameter (mostly so they can assume normal priors). Whereas this early paper suggests using the knowledge of the physical problem as reparametrized priors, which will usually lead to reasonable parameter estimations, just like the example in the notebook.

So if you think it would improve things, I can add more to the notebook (and doc strings) that the behaviour of this distribution does require meaningful priors for the problem domain, be it hydrology, finance, engineering, and so on... Thanks!

@ricardoV94
Copy link
Member

ricardoV94 commented Nov 18, 2021

@ricardoV94 I've been looking around the literature a bit more on the divergences problem for this distribution. Most of the work uses just MH and so the sampler is not yelling at them as loudly as HMC, for which there are few works. This one though limits the shape parameter away from heavy tails to deal with the problem, and also uses a log scaling on the scale parameter (mostly so they can assume normal priors). Whereas this early paper suggests using the knowledge of the physical problem as reparametrized priors, which will usually lead to reasonable parameter estimations, just like the example in the notebook.

So if you think it would improve things, I can add more to the notebook (and doc strings) that the behaviour of this distribution does require meaningful priors for the problem domain, be it hydrology, finance, engineering, and so on... Thanks!

That sounds great.

@fonnesbeck suggested we may want to add a PyMC module (or separate package) for more untested/experimental stuff, where we would be less worried about adding things like this distribution or more exotic samplers. Anyway, for the time being feel free to update this PR unless someone else wants to veto it.

@ricardoV94
Copy link
Member

@ccaprani We have created a new repository for more experimental features at https://github.com/pymc-devs/pymc-experimental

Your contribution would be more than welcome over there if you are interested. The pymc-examples could then be updated to import it from there.

Let us know if you want to pick this up and thanks again for all the work already done here!

@ccaprani
Copy link
Author

@ricardoV94 sounds like a good plan. I presume the proposed pmx.Distribution can be mixed with the usual pymc distributions in the same model context?

What are the logistics then? I see that the PR in the pymc-experiemental repo is being pulled from pymc-main - how do we get this PR over there without needing to fully replicate pymc itself over there (including documentation). I'm probably missing something obvious - any guidance appreciated!

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 31, 2022

@ricardoV94 sounds like a good plan. I presume the proposed pmx.Distribution can be mixed with the usual pymc distributions in the same model context?

Yes, that's the idea

What are the logistics then? I see that the PR in the pymc-experiemental repo is being pulled from pymc-main - how do we get this PR over there without needing to fully replicate pymc itself over there (including documentation). I'm probably missing something obvious - any guidance appreciated!

I have no idea what is going on with that PR. I would imagine you would fork the pymc-experimental repo and implement your distribution there under an analogous directory pymc-experimental/distributions/continuous.py (and make it accessible from the root)

Users can then:

import pymc as pm
import pymc_experimental as pmx
 
with pm.Model():
  x = pmx.GenExtreme(...)

@ccaprani
Copy link
Author

That's wonderful! And the documentation relating to the distribution would go where then?

@ricardoV94
Copy link
Member

That's wonderful! And the documentation relating to the distribution would go where then?

Right now just the docstrings. We still have to setup a documentation page for that repository

@ccaprani
Copy link
Author

ccaprani commented Feb 2, 2022

This PR is moved to pymc-experiemental as pymc-devs/pymc-extras#4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants