Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] roadmap of probability distributions to implement #22

Open
14 of 35 tasks
fkiraly opened this issue Aug 23, 2023 · 32 comments
Open
14 of 35 tasks

[ENH] roadmap of probability distributions to implement #22

fkiraly opened this issue Aug 23, 2023 · 32 comments
Labels
feature request New feature or request good first issue Good for newcomers implementing algorithms Implementing algorithms, estimators, objects native to skpro module:probability&simulation probability distributions and simulators

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 23, 2023

It would be great to have a basic set of probability distributions implemented.

Umbrella issue for implementing sktime probability distributions.

Recipe: use the extension_templates/distribution.py extension template.
Examples:

  • Normal, for de-novo implementations or manual interfaces
  • Fisk, for interfacing scipy distributions - this is much easier than using the full template

High priority:

mid priority:

low priority:

lower priority:

list of many more (lowest priority)
https://docs.scipy.org/doc/scipy/reference/stats.html#probability-distributions - can be interfaced via _ScipyDist adapter easily!
https://en.wikipedia.org/wiki/File:ProbOnto2.5.jpg

Mirrors sktime/sktime#4518
(for high and mid priority)

Contributions can be made to either repository, and should be copied over to the other once approved/merged, until the modules are merged into one.

@fkiraly fkiraly added good first issue Good for newcomers module:probability&simulation probability distributions and simulators implementing algorithms Implementing algorithms, estimators, objects native to skpro feature request New feature or request labels Aug 23, 2023
fkiraly added a commit that referenced this issue Aug 25, 2023
Adds empirical distribution.

Towards #22.

Mirror of sktime/sktime#5094
fkiraly added a commit that referenced this issue Aug 25, 2023
Implements mixture of distributions.

Towards #22, and required for
ensemble regressor.

Also adds a default implementation for `ppf` in the `BaseDistribution`,
using the bisection method to invert a `cdf`, if present.
fkiraly pushed a commit that referenced this issue Aug 27, 2023
<!--
Thanks for contributing a pull request! Please ensure you have taken a
look
at our contribution guide:
https://skbase.readthedocs.io/en/latest/contribute.html
-->

#### Reference Issues/PRs
<!--
Example: Fixes #1234. See also #3456.

Please use keywords (e.g., Fixes) to create link to the issues or pull
requests
you resolved, so that they will automatically be closed when your pull
request
is merged. See
https://github.com/blog/1506-closing-issues-via-pull-requests
-->

Mirror of `sktime` sktime/sktime#5050. Towards #22


#### What does this implement/fix? Explain your changes.
<!--
A clear and concise description of what you have implemented. Remember
to implement
unit tests and docstrings if your pull request commits code to the
repository.
-->

Add student's t-distribution.
@fkiraly fkiraly changed the title [ENH] (wish)list of probability distributions to implement [ENH] roadmap & (wish)list of probability distributions to implement Sep 13, 2023
@fkiraly fkiraly pinned this issue Sep 13, 2023
@fkiraly fkiraly changed the title [ENH] roadmap & (wish)list of probability distributions to implement [ENH] roadmap of probability distributions to implement Sep 13, 2023
@bhavikar04
Copy link
Contributor

Hi, I'm interested in taking this up. Would you say priority of the distributions aligns with the difficulty in implementation? I'd like to do either multivariate normal or uniform continuous.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 12, 2024

Hmmm, I'd say it is currently actually the opposite way. That is, the remaining low priority ones are easier to get started with, than the remaining high priority ones - simply since the easy higher priority ones are already done.

So, uniform continuous then? Parameterized by lower and upper.

I don't have a reference for energy and squared norm integrals, but these should not be too difficult to obtain. Let me know if you need input there, we can always start with the more common methods.

@an20805
Copy link
Contributor

an20805 commented Mar 12, 2024

Hey @fkiraly, I have implemented uniform continuous distribution in my local branch. How do I proceed further?
I would also love to implement other distributions.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 12, 2024

@an20805, nice! Let's not duplicate then, @bhavikar04 - how about beta?

The next step would be making a pull request to this repository, and a review cycle, then merge.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 13, 2024

Re energy, for $X, Y\sim Unif(a, b)$, I get:

$\mathbb{E}[|X - y|] = |y - \frac{b+a}{2} |$ if $y$ lies outside $[a, b]$,
and $\mathbb{E}[|X - y|] = \frac{(b-y)^2}{2(b-a)}+ \frac{(a-y)^2}{2(b-a)}$ if inside,

and

$\mathbb{E}[|X - Y|] = \frac{1}{3} (b-a)$ - double checking appreciated.

@bhavikar04
Copy link
Contributor

In that case I'll take up log normal distribution then.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 13, 2024

Pinging @Alex-JG3 and @ivarzap who most recently implemented distributions, in case you have any general starter advice.

@bhavikar04
Copy link
Contributor

bhavikar04 commented Mar 15, 2024

Hey,

So I'm a little unsure on what the energy will be for the log normal distribution and can't find much online, is there any literature you can point me to?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 15, 2024

@bhavikar04, Appendix A.2 of "evaluating forecasts with scoringRules" has a few explicit formulae for the energy, including the log-normal distribution. The expression is hard to track in implementation, so I would advise comparing against the Monte-Carlo default if you implement it.

I would also suggest you try it on paper, there's a good chance of errors in rare calculations like these.
Further, Wolfram Alpha might also help. Whereas, ChatGPT and the like typically produce plausible garbage.

@bhavikar04
Copy link
Contributor

Hey thank you so much, I'll try to chalk out a suitable implementation soon. ChatGPT was humble enough to admit it doesn't know enough ;)

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 16, 2024

Yes, I admit I also tried as computing integrals can get tiring: https://xkcd.com/2117/
Wolfram is not bad, it makes sense to double check though. As said, there is a default Monte Carlo implementation, so if you set the number of samples high, the matrices should be similar.

@sukjingitsit
Copy link
Contributor

sukjingitsit commented Mar 23, 2024

I would like to work on implementing the chi-square distribution. To confirm, we have to follow the template of Laplace and Normal, where we implement the
mean, var, pdf, cdf, logpdf and ppf alongside the energy, right?
To characterise chi-square, I assume, as standard practice, we will use the degrees of freedom, right?

@sukjingitsit
Copy link
Contributor

sukjingitsit commented Mar 23, 2024

The current implementation of ppf wraps a scipy function directly due to lack of a closed mathematical form. Similarly, while cross-energy can be mathematically derived, self-energy is difficult to solve (nor could I find literature on it) in a closed form, the best options for that is integration or sampling. Thus, energy hasn't been implemented yet

@malikrafsan
Copy link
Contributor

Ahh, I see, thank you so much for your guidance @fkiraly ! Then my PRs are ready to be reviewed. I would very much love to hear your feedback. However, if you have any reference on energy formula of those two distribution, then I would still very like to implement it, thank you so much!

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 7, 2024

if you have any reference on energy formula of those two distributiom

have you checked in the paper above? If not there, one would have to derive it.

@malikrafsan
Copy link
Contributor

Do you mean this paper?

Appendix A.2 of "evaluating forecasts with scoringRules" has a few explicit formulae for the energy, including the log-normal distribution.

Yes, I have checked the paper but I cannot find the formula. I can only find CRPS and CDF formulas. Does CRPS mean the energy? If so, I think I misunderstood your previous statements

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 9, 2024

Yes, CRPS is closely releated, it is the cross-term minus half the self-term (compare definitions).

The unfortunate bit about the paper is that it only gives CRPS, but not the self-term or cross-term in isolation. However, it should not be too hard to back these out, using that shifting the distribution location by a constant leaves the self-term unchanged, but not the cross-term.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 9, 2024

More precisely, a useful formula to use is

$$\lim_{y \rightarrow \infty} \mbox{CRPS}(y) - y = -\mathbb{E}[X] - \frac{1}{2}\mathbb{E}|X - X'|,$$

i.e., you can obtain the cross-term via taking a limit, if you know the expressions for CRPS and the expectation already.

(the equation follows from observing that the absolute value in $\mathbb{E}\left|y - X \right|$ disappears in the limit)

fkiraly pushed a commit that referenced this issue Apr 17, 2024
Towares
#22

#### What does this implement/fix? Explain your changes.
<!--
A clear and concise description of what you have implemented.
-->
Weibull probability distribution
fkiraly pushed a commit that referenced this issue Apr 18, 2024
Towards #22 

#### What does this implement/fix? Explain your changes.
Lognormal probability distribution
fkiraly pushed a commit that referenced this issue Apr 18, 2024
Towards #22

#### What does this implement/fix? Explain your changes.
Logistic probability distribution
fkiraly pushed a commit that referenced this issue Apr 25, 2024
Implemented Uniform Continuous Probability Distribution, towards
#22
fkiraly pushed a commit that referenced this issue Apr 25, 2024
Addresses #22 for chi-squared case
@malikrafsan malikrafsan mentioned this issue May 4, 2024
6 tasks
fkiraly pushed a commit that referenced this issue May 7, 2024
Towards #22

This PR implements a Beta distribution based on the Scipy Adapter
fkiraly pushed a commit that referenced this issue May 24, 2024
Implements Gamma distribution. Towards #22
@fkiraly fkiraly mentioned this issue May 25, 2024
5 tasks
fkiraly pushed a commit that referenced this issue May 25, 2024
#### What does this implement/fix? Explain your changes.
<!--
A clear and concise description of what you have implemented.
--> Implements Alpha Distribution. Towards #22
fkiraly pushed a commit that referenced this issue Jun 4, 2024
Implements Half Cauchy Distribution, towards #22
fkiraly pushed a commit that referenced this issue Jun 7, 2024
Towards #22, Implements Log Laplace Distribution
fkiraly pushed a commit that referenced this issue Jun 7, 2024
Towards #22, Implements Half Logistic distribution
@sukjingitsit
Copy link
Contributor

I would like to work on the Pareto distribution if possible

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 14, 2024

all yours, @sukjingitsit!

fkiraly pushed a commit that referenced this issue Jun 22, 2024
Addresses #22 for the Pareto distribution
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request good first issue Good for newcomers implementing algorithms Implementing algorithms, estimators, objects native to skpro module:probability&simulation probability distributions and simulators
Projects
None yet
Development

No branches or pull requests

5 participants