WIP: FEA Quantile regression for decision trees #1

cakedev0 · 2025-09-14T19:36:19Z

WIP

Follow up to PR scikit-learn#32100 and esp. discussion here: scikit-learn#32100 (comment)

TODO (bottom-up order):

Reference Issues/PRs

scikit-learn#32100

What does this implement/fix? Explain your changes.

Any other comments?

Maths:

We consider a weighted dataset ${(y_i, w_i)}_{i}$ with non-negative weights $w_i$.

For a scalar prediction $q$, the weighted pinball loss is

$$ L_\alpha(q) = \sum_{i} w_i \big( \alpha \max(y_i - q, 0) + (1 - \alpha)\max(q - y_i, 0) \big) $$

Equivalently, splitting by whether $y_i \ge q$ or $y_i < q$:

$$ L_\alpha(q) = \alpha \sum_{i: y_i \ge q} w_i (y_i - q) + (1 - \alpha) \sum_{i: y_i < q} w_i (q - y_i) $$

To evaluate this efficiently, introduce the aggregates

$$ W^+(q) = \sum_{i: y_i \ge q} w_i, \qquad Y^+(q) = \sum_{i: y_i \ge q} w_i y_i, $$

$$ W^-(q) = \sum_{i: y_i < q} w_i, \qquad Y^-(q) = \sum_{i: y_i < q} w_i y_i. $$

Using these, the loss admits the "O(1)" form

$$ L_\alpha(q) = \alpha \big( Y^+(q) - q W^+(q) \big) + (1 - \alpha) \big( q W^-(q) - Y^-(q) \big). $$

Or in the code:

q * (above.weighted_sum - quantile * above.total_weight)
+ (1 - q) * (quantile * below.total_weight - below.weighted_sum)

github-actions · 2025-09-14T19:37:19Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 0cdeaaf. Link to the linter CI: here}

cakedev0 · 2025-09-17T20:20:16Z

@adam2392 and @ogrisel: I moved forward here and before polishing it for review, I'd like your input:

(polishing it for review will take some time)

do we validate we want to include quantile decision tree regressors in sklearn? Or should I open an issue to discuss this inclusion?
naming:
- name of the new criterion? For now: "pinball" because "poisson" stands for "pinball loss"
- public name of the parameter that controls the quantile/pinball loss $\alpha$? For now, pinball_alpha but doesn't make a clear link with quantile... (though we can make the link in the docstring & user guide)

Rest of naming is more internal, and can be discussed during the review process.

Thanks 🙏

adam2392 · 2025-09-17T20:31:06Z

I want to make sure we separate the notion of quantile loss criterion in training and quantile during inference since those can be independent. To your points:

I am in favor but perhaps there should be a broader discussion? I'll let @ogrisel decide. I believe Quantiles or robustness is seen as a key focus area of scikit learn so considering we can get this feature without significant runtime penalty, then I don't see reasons to reject.
perhaps quantile_loss? And the optional quantile therefore must be specified with default being 0.5. We can even consider having a discussion of deprecating mean_absolute_error since I doubt ppl used it given its runtime complexity.

cakedev0 added 2 commits September 14, 2025 18:51

minimal changes

debf965

AE to pinball loss

6e267d5

cakedev0 mentioned this pull request Sep 14, 2025

Fix: improve speed of trees with MAE criterion from O(n^2) to O(n log n) scikit-learn/scikit-learn#32100

Open

cakedev0 added 2 commits September 17, 2025 20:21

Pass down pinball_alpha

592e74a

Merge branch 'mae-split-optim' into quantile-regression

a2f3a85

cakedev0 added 4 commits September 19, 2025 12:51

small changes

ecd2f15

fixes

075243c

Merge branch 'mae-split-optim' into quantile-regression

c204c20

Merge branch 'mae-split-optim' into quantile-regression

0cdeaaf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: FEA Quantile regression for decision trees #1

WIP: FEA Quantile regression for decision trees #1

cakedev0 commented Sep 14, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 14, 2025 •

edited

Loading

Uh oh!

cakedev0 commented Sep 17, 2025

Uh oh!

adam2392 commented Sep 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WIP: FEA Quantile regression for decision trees #1

Are you sure you want to change the base?

WIP: FEA Quantile regression for decision trees #1

Conversation

cakedev0 commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

WIP

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Maths:

Uh oh!

github-actions bot commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

cakedev0 commented Sep 17, 2025

Uh oh!

adam2392 commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cakedev0 commented Sep 14, 2025 •

edited

Loading

github-actions bot commented Sep 14, 2025 •

edited

Loading

adam2392 commented Sep 17, 2025 •

edited

Loading