Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] multiple quantile regression #108

Merged
merged 19 commits into from
Oct 20, 2023

Conversation

Ram0nB
Copy link
Contributor

@Ram0nB Ram0nB commented Oct 5, 2023

Reference Issues/PRs

Fixes #107

What does this implement/fix? Explain your changes.

For quantile regression, often more than one quantile probability is of interest. However, existing Sklearn compatible quantile regressors always fit and predict a single quantile probability. To the best of my knowdlegde, there is no standardized way to integrate multiple quantile regression with Sktime/Skpro probabilistic prediction methods such as predict_quantiles/predict_intervals. This PR adds new Skpro regressor that wraps multiple quantile regressors and supports probabilistic predictions from wrapped regressors.

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

All added/changed files :)

Did you add any tests for the change?

No - I ran the Skpro tests locally without issues

Any other comments?

No

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition, thanks! This will be useful in sktime!

Blocking comments below.

Docs:

  • defaults should be stated in docstring
  • the exact algorithm should be stated in docstring (even if it is simple)

Algorithm:

  • can we implement something for predict_proba? I would simply put masses on the quantiles we predict, such that any quantiles closest to that will be supported at the same point.
    • let's say, we predict three quantiles, 0.1, 0.5, 0.9. Then, we put masses 0.3, 0.4, 0.3 on the respective predictions, and get a weighted empirical distribution with support at three points.
    • this will allow sidestep the "no regressors for alpha" problem, and give the correct predictions for the correct alpha.
    • if this predict_proba is implemented, the remaining methods are filled in automatically, although one might think about efficiency.

Interface requirements:

  • get_params should not be overwritten like this.
    • do you want to allow different quantile regressors per quantile point? If yes, we would need to use the heterogenous ensemble base class. Perhaps better in a separate PR since it's a bit fiddly?
  • get_test_params should return two or more test parameter settings.

@Ram0nB
Copy link
Contributor Author

Ram0nB commented Oct 6, 2023

Many thanks for your very quick feedback, much appreciated!

  • get_params should not be overwritten like this.

    • do you want to allow different quantile regressors per quantile point? If yes, we would need to use the heterogenous ensemble base class. Perhaps better in a separate PR since it's a bit fiddly?

Per probability level in alpha I intend to use one regressor. Since an instance of a QuantileRegressor is passed in init, it likely already has a quantile probability level assigned to it. The value assigned to the quantile probability parameter is however overridden by the logic in fit: per quantile probability in alpha, I make a copy of the quantile regressor and set the corresponding probability with set_params. This initial quantile probability level would also be exposed to the user with get/set deep params, although changing it with those methods wouldn't make any difference. Therefore, I thought it would be best if the quantile probability of the QuantileRegressor instance passed in init wouldn't be exposed to the user with get/set params methods. Can you think of a better way to handle this?

I didn't know that the heterogenous ensemble forecaster base class exists and I've just had a look at it. It's nice that it has functionality for fitting and predicting with multiple estimators (similar to what I'm doing). Using a similar base class for a probabilistic regressor would resolve the aforementioned issue, however I can also think about a few cons:

  • For init, the user would have to create and list all the QuantileRegressor instances for the desired probabilies rather than just provide the desired probabilities and one QuantileRegressor instance.
  • For parameter optimisation, the user would need to set the hyperparameters for each quantileregressor seperately if we'd use a heterogenous ensemble base. (with that base class the hyperparameters of the nested estimators can be set seperately, right?). Currently, hyperparameters can only be set for all QuantileRegressors at once. Since fitting multiple quantile regressors is a computationally intensive task, I don't think that many users would tune hyperparameters per quantile regressor, but rather for all quantile regressors in once. Therefore, if we'd use a heterogenous ensemble base it would complicate the hyperparameter tuning process for most users compared to the current implementation.
  • Current heterogenous ensemble base class not available for probabilistic regressors but rather only for Sktime's forecasters.

Personally (with my current understanding), I tend towards not using the heterogenous ensemble base class as it seems like it would complicate things without significant benefit. What are your thoughts on this?

Algorithm:

  • can we implement something for predict_proba? I would simply put masses on the quantiles we predict, such that any quantiles closest to that will be supported at the same point.

    • let's say, we predict three quantiles, 0.1, 0.5, 0.9. Then, we put masses 0.3, 0.4, 0.3 on the respective predictions, and get a weighted empirical distribution with support at three points.
    • this will allow sidestep the "no regressors for alpha" problem, and give the correct predictions for the correct alpha.
    • if this predict_proba is implemented, the remaining methods are filled in automatically, although one might think about efficiency.

Something for predict_proba would indeed be a nice addition. Can you expand a bit on why we'd put masses on the quantiles rather than use similar weights?

I'm not really familiar with the implementation of distributions in Sktime/Skpro, but would it also be possible to generate and return empirical distributions from the predicted quantiles in predict_proba? For the cdf we could interpolate between (sorted) quantiles and use step functions for the tails since we don't have any information about it. This would reflect the main logic of the MultipleQuantileRegressor best I think. Moreover, we could use the aforementioned cdf function to sidestep the "no regressors for alpha" problem in predict_quantiles. What are your thoughts on this?

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 6, 2023

Therefore, I thought it would be best if the quantile probability of the QuantileRegressor instance passed in init wouldn't be exposed to the user with get/set params methods. Can you think of a better way to handle this?

I see. The important point here is, get/set params methods must return and access the same parameters as passed to __init__. This is one of the "great laws of scikit-learn" that should not be deviated from. One can of discuss sense and sensibilities, but deviating means becoming incompatible with a large ecosystem, so we (and the tests) have to enforce it.

But of course that still leaves up to discussion and choice the exact parameterization. So, regarding "better ways", I think your parameterization is probably the best and most user friendly. Overall I can think of:

  • estimator, alpha-name, alpha-vals, that's basically yours and would be also my a-priori first preference (up to, perhaps shortening the parameter names a bit)
  • [(est, alpha1), (est2, alpha2), ...] syntax with list of tuples like in sklearn pipelines. Worse imo since a user using the same estimator has to do a lot of manual cloning
  • that, but with an option to pass sth like [(est, alpha1, ..., alphan), (est2, alphaprime)]. I think this is only slightly but still worse than your choice, even if it allows more freedom in the ensembling.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 6, 2023

however I can also think about a few cons

Regarding the heteorogenous ensemble, I think you list its main drawbacks correctly.

  • the drawbacks in the parameterization are as discussed, I agree with your assessment, and that's why I think your choice in parameterization is better. I think it's worth sacrificing flexibility for usability. Flexibility can be achieved by chaining with a multiplexer.
  • though, I mean the heterogenous base class in scikit-base, which can be used for arbitrary objects. True though, it has not been used in skpro extensively yet (but it exists, e.g., look at the pipeline), so there would likely be some fiddling around. Probably not worth the effort in this special case.

I tend towards not using the heterogenous ensemble base class as it seems like it would complicate things without significant benefit. What are your thoughts on this?

Agreed from a general perspective - I would only suggest to use it if you want different regressor per alpha, but I think you have explained that that's not what you actually want to do, so we agree on the "what" and on the "how" both.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 6, 2023

Something for predict_proba would indeed be a nice addition. Can you expand a bit on why we'd put masses on the quantiles rather than use similar weights?

If we put same weights, then the quantiles will in general not be the quantiles anymore. If you go with empirical with the same support, you need to put weights with the property so the cumulative weights at the chosen quantile match. There is in general an infinity of such choices, and the "vornoi/closest" by measure seems most canonical.

For the cdf we could interpolate between (sorted) quantiles and use step functions for the tails since we don't have any information about it. This would reflect the main logic of the MultipleQuantileRegressor best I think. Moreover, we could use the aforementioned cdf function to sidestep the "no regressors for alpha" problem in predict_quantiles. What are your thoughts on this?

I have thought about the above as well, it was my first thought actually and I rejected it because:

  • both proposed solutions sidestep the "no regressors for alpha" issue
  • the choice of cdf above would require to implement new "interpolative" distribution types and will require to make arbitrary choices for the tails, as well on how to interpolate. The cdf (which needs to be a full skpro distr!) would hence carry severely more implementative burden, as well as more information than the predicted quantiles.
  • whereas, the proposed empirical distribution carries exactly as much information as the prediced quantiles - in fact in the parameterization of Empirical you just vertical-stack the matrix of predictive quantiles, and there is no further fiddling, it's just "plugin" into an existing and well-known distribution type. (well, there are also the weights, but they are easy to get).

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 6, 2023

PS: I'm happy to implement the predict_proba if you produce for me a private utility function which, given the predict-X, produces a pd.DataFrame with the following specifications:

  • row multiindex (alpha, X.index)
  • column index as y in fit
  • entry is the quantile prediction at quantile alpha and the respective X-index point

alternatively, list of row=X.index, col=y.index is also fine (len(alpha)-many), perhaps even easier. Don't worry about the pd.MultiIndex in this case.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 6, 2023

here's another argument for implementing the variant with "empirical":

Imagine a probability calibrator (a transformer distr to distr) acting on empiricals, applying a smoother or linear interpolator like you propose, @Ram0nB.

Now your version can be easily obtained from my version, by appending. The other way round this is not true ("easy" or perhaps even "possible"), as the transformation is not invertible, or at least not easily so in the generic case.

Therefore, the "@Ram0nB version" is actually a pipeline (transformed target regressor for proba) of the "@fkiraly version" with a probability calibrator that does the smoothing or interpolation of the empirical.

From a general design perspective in sklearn-like libraries, it is a good idea (and one that has helped a lot in sktime) to implement "minimal components" rather than "all-in-one estimators". I.e., if it looks like X = SomePipeline(Y, Z), there should be a strong preference to implement three estimators - Y, Z, SomePipeline - rather than X. An even stronger preference if some of the three are already existing.
(of course, for runtime efficiency purposes, X alone can make sense, too - in this case, Y, Z, SomePipeline should still get implemented in addition to X)

@Ram0nB
Copy link
Contributor Author

Ram0nB commented Oct 10, 2023

Thanks for the clarification @fkiraly, I agree with all your points. I processed all the requested changes, looking forward to your feedback 😃 .

PS: I'm happy to implement the predict_proba if you produce for me a private utility function which, given the predict-X, produces a pd.DataFrame with the following specifications:

  • row multiindex (alpha, X.index)
  • column index as y in fit
  • entry is the quantile prediction at quantile alpha and the respective X-index point

This would be great, thanks! The utility function is called _predict_proba_util.

@Ram0nB Ram0nB requested a review from fkiraly October 10, 2023 10:16
@fkiraly
Copy link
Collaborator

fkiraly commented Oct 10, 2023

ok, will look at it after work today - 3.12 compatible releases of skpro and sktime having priority first, but that should be out of the way quickly (I hope...)

@fkiraly fkiraly added enhancement module:regression probabilistic regression module implementing algorithms Implementing algorithms, estimators, objects native to skpro labels Oct 10, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Oct 10, 2023

I added a draft for _predict_proba - unfortunately, I keep experiencing kernel crashes at fit, not sure whether that's sklearn or my own python setup, so I was not able to properly test. Hope that it is still clear what I'm trying to do.

@codecov-commenter
Copy link

codecov-commenter commented Oct 10, 2023

Codecov Report

Attention: Patch coverage is 91.58879% with 9 lines in your changes missing coverage. Please review.

Project coverage is 71.73%. Comparing base (9daace6) to head (00960fd).
Report is 227 commits behind head on main.

Files Patch % Lines
skpro/regression/multiquantile.py 91.58% 4 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #108      +/-   ##
==========================================
+ Coverage   70.00%   71.73%   +1.73%     
==========================================
  Files          97       98       +1     
  Lines        5157     5264     +107     
  Branches      952      971      +19     
==========================================
+ Hits         3610     3776     +166     
+ Misses       1317     1220      -97     
- Partials      230      268      +38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

- prob pred params mandatory
- docstring updated
- predict_proba finalised
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Ram0nB
Copy link
Contributor Author

Ram0nB commented Oct 11, 2023

Many thanks for the quick _predict_proba draft, worked right away!

what happens if alpha is None? It looks to me like it will just fail then. So, should alpha not be compulsory?

Only the prob. prediction methods wouldn't work as I did an input check in init, but it indeed makes more sense to make all parameters for the prob prediction methods non-optional.

The newest version has the following changes:

  • prob prediction parameters non-optional
  • finalised _predict_proba
  • updated docstring

Do you think that the docstring explains the "algorithm" for all prob prediction methods extensive enought?

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 11, 2023

ok, let's see if it runs! 😁

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 11, 2023

Do you think that the docstring explains the "algorithm" for all prob prediction methods extensive enought?

I suppose as good as it can be without math (perhaps instead of "nearest probability", "nearest quantile probability"). ALthough I think it can be further improved by adding a little math.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 11, 2023

linting fails because you have an empty notebook in the branch, kindly remove

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I see you found an efficient algorithm for _predict_quantiles too.

One thing that might be worth to check before we merge, if the user passes non-sorted alpha either to predict_quantiles, or to __init__, is all the logic still correct? I have a nagging feeling that it might not be, and I'm not sure whether we check in the tests either.

- updated docstring with little math
- handle unsorted alpha
@Ram0nB
Copy link
Contributor Author

Ram0nB commented Oct 12, 2023

I see that the tests are failing, I'll have a look at it tomorrow :)

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 12, 2023

I see that the tests are failing, I'll have a look at it tomorrow :)

Is this readthedocs again - might be unrelated to you, I thought I fixed it: #122

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 13, 2023

ah, I think it's the docstring. If you use TeX, and hence the backslash character, the string needs to be preceded by an "r" character, i.e.,

r"""This is a docstring.

This is a docstring description with :math:`\pi`
"""

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 14, 2023

updated the docstring with the r char, and also made some small changes to the math paragraph (hope that's ok) - now it should reach the tests and not fail readthedocs.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 14, 2023

FYI, the regressor was failing a test since the alpha in the columns of predict_quantiles are expected in the same order as the alpha argument, which needs not be sorted.

I made two small changes which should fix that - feel free to revert or change.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works now, with my changes, see above.
Happy to merge if you are happy with my changes.

@Ram0nB
Copy link
Contributor Author

Ram0nB commented Oct 16, 2023

Many thanks for your changes, all looks great to me! 👍

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 20, 2023

Thanks for your contribution!

Once released in 2.1.1, you should be able to use this with sktime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement implementing algorithms Implementing algorithms, estimators, objects native to skpro module:regression probabilistic regression module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] multiple quantile regression
3 participants