-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] roadmap of probability distributions to implement #22
Comments
Adds empirical distribution. Towards #22. Mirror of sktime/sktime#5094
Implements mixture of distributions. Towards #22, and required for ensemble regressor. Also adds a default implementation for `ppf` in the `BaseDistribution`, using the bisection method to invert a `cdf`, if present.
<!-- Thanks for contributing a pull request! Please ensure you have taken a look at our contribution guide: https://skbase.readthedocs.io/en/latest/contribute.html --> #### Reference Issues/PRs <!-- Example: Fixes #1234. See also #3456. Please use keywords (e.g., Fixes) to create link to the issues or pull requests you resolved, so that they will automatically be closed when your pull request is merged. See https://github.com/blog/1506-closing-issues-via-pull-requests --> Mirror of `sktime` sktime/sktime#5050. Towards #22 #### What does this implement/fix? Explain your changes. <!-- A clear and concise description of what you have implemented. Remember to implement unit tests and docstrings if your pull request commits code to the repository. --> Add student's t-distribution.
Hi, I'm interested in taking this up. Would you say priority of the distributions aligns with the difficulty in implementation? I'd like to do either multivariate normal or uniform continuous. |
Hmmm, I'd say it is currently actually the opposite way. That is, the remaining low priority ones are easier to get started with, than the remaining high priority ones - simply since the easy higher priority ones are already done. So, uniform continuous then? Parameterized by lower and upper. I don't have a reference for energy and squared norm integrals, but these should not be too difficult to obtain. Let me know if you need input there, we can always start with the more common methods. |
Hey @fkiraly, I have implemented uniform continuous distribution in my local branch. How do I proceed further? |
@an20805, nice! Let's not duplicate then, @bhavikar04 - how about beta? The next step would be making a pull request to this repository, and a review cycle, then merge. |
Re energy, for
and |
In that case I'll take up log normal distribution then. |
Hey, So I'm a little unsure on what the energy will be for the log normal distribution and can't find much online, is there any literature you can point me to? |
@bhavikar04, Appendix A.2 of "evaluating forecasts with I would also suggest you try it on paper, there's a good chance of errors in rare calculations like these. |
Hey thank you so much, I'll try to chalk out a suitable implementation soon. ChatGPT was humble enough to admit it doesn't know enough ;) |
Yes, I admit I also tried as computing integrals can get tiring: https://xkcd.com/2117/ |
I would like to work on implementing the chi-square distribution. To confirm, we have to follow the template of Laplace and Normal, where we implement the |
The current implementation of ppf wraps a scipy function directly due to lack of a closed mathematical form. Similarly, while cross-energy can be mathematically derived, self-energy is difficult to solve (nor could I find literature on it) in a closed form, the best options for that is integration or sampling. Thus, energy hasn't been implemented yet |
Ahh, I see, thank you so much for your guidance @fkiraly ! Then my PRs are ready to be reviewed. I would very much love to hear your feedback. However, if you have any reference on energy formula of those two distribution, then I would still very like to implement it, thank you so much! |
have you checked in the paper above? If not there, one would have to derive it. |
Do you mean this paper?
Yes, I have checked the paper but I cannot find the formula. I can only find CRPS and CDF formulas. Does CRPS mean the energy? If so, I think I misunderstood your previous statements |
Yes, CRPS is closely releated, it is the cross-term minus half the self-term (compare definitions). The unfortunate bit about the paper is that it only gives CRPS, but not the self-term or cross-term in isolation. However, it should not be too hard to back these out, using that shifting the distribution location by a constant leaves the self-term unchanged, but not the cross-term. |
More precisely, a useful formula to use is i.e., you can obtain the cross-term via taking a limit, if you know the expressions for CRPS and the expectation already. (the equation follows from observing that the absolute value in |
Towares #22 #### What does this implement/fix? Explain your changes. <!-- A clear and concise description of what you have implemented. --> Weibull probability distribution
Towards #22 #### What does this implement/fix? Explain your changes. Lognormal probability distribution
Towards #22 #### What does this implement/fix? Explain your changes. Logistic probability distribution
Implemented Uniform Continuous Probability Distribution, towards #22
Towards #22 This PR implements a Beta distribution based on the Scipy Adapter
Implements Gamma distribution. Towards #22
#### What does this implement/fix? Explain your changes. <!-- A clear and concise description of what you have implemented. --> Implements Alpha Distribution. Towards #22
Implements Half Cauchy Distribution, towards #22
Towards #22, Implements Log Laplace Distribution
Towards #22, Implements Half Logistic distribution
I would like to work on the Pareto distribution if possible |
all yours, @sukjingitsit! |
Addresses #22 for the Pareto distribution
It would be great to have a basic set of probability distributions implemented.
Umbrella issue for implementing
sktime
probability distributions.Recipe: use the
extension_templates/distribution.py
extension template.Examples:
Normal
, for de-novo implementations or manual interfacesFisk
, for interfacingscipy
distributions - this is much easier than using the full templateHigh priority:
mid priority:
low priority:
Exponential
distribution #325lower priority:
list of many more (lowest priority)
https://docs.scipy.org/doc/scipy/reference/stats.html#probability-distributions - can be interfaced via
_ScipyDist
adapter easily!https://en.wikipedia.org/wiki/File:ProbOnto2.5.jpg
Mirrors sktime/sktime#4518
(for high and mid priority)
Contributions can be made to either repository, and should be copied over to the other once approved/merged, until the modules are merged into one.
The text was updated successfully, but these errors were encountered: