-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fitting Linear Functions inside Tree leaves (Feature Request) #5725
Comments
Feature-request on supporting multi-output regression was mentioned before (#2087 #3439). It will bring substantial maintenance costs, as essentially what needs is to use another base learner. To simulate "linear functions in leaf nodes", XGBoost:Regression+MultiOutputRegressor in sklearn works reasonably well, despite many paper claims that this solution ignores correlations between different target variables. |
It's something I wanted for a long time. Also I have a proof of concept impl in #5460 . I just need to allocate time to focus on it. |
@AaronX121 Do you have any citations for the papers
? |
Actually correlation doesn't help much in my experiments. Your result is likely to be worse due to model capacity. That's one of the reasons that I'm not rushing the implementation. It's mostly for faster inference time. |
@Murgio Hi, here is one work that directly tackles the multi-output regression problem for GBDT: https://arxiv.org/pdf/1909.04373.pdf, you may find it helpful :). Its code is available on GitHub. For sparse multi-output, here is another work: http://proceedings.mlr.press/v70/si17a.html. Also, many variants of CART are equipped with linear models in internal or leaf nodes, such as piece-wise linear tree (https://arxiv.org/pdf/1802.05640.pdf), soft decision tree (https://arxiv.org/pdf/1711.09784.pdf), and many more. They can be easily combined with one gradient boosting wrapper for multi-output regression / multi-label classification. I also ran experiments on some benchmark datasets (http://mulan.sourceforge.net/datasets-mtr.html). It is hard to say that these methods are superior to XGBoost+MultiOutputRegressor. |
@AaronX121 I have possibly explained my request poorly or misunderstood your response but I don't see how what I asked for is equivalent to XGBoost plus Multioutputregressor. What I was requesting is described in the paper we both linked Ps I fixed my links in my initial request |
I second @Fish-Soup 's last question: how is XGBoost plus Multioutputregressor similar/equivalent to the initial feature request? Are there papers or examples contrasting both? Went and read the Multioutputregressor docs and doesn't seem like the right solution. I have only one target, but I'd like the learners to be piecewise linear instead of step functions. |
Hello all, |
Do we have any updates on this? |
I was wondering if it where possible to develop an new booster, that instead of taking the mean of values inside a leaf instead fitted a linear function. In cases of lower numbers a features its possible that a piece-wise linear model will perform better than a tree based one. Requiring less leaves and trees to model smoothly changing functions. In certain cases this could produce higher accuracy predictions. An additional benefit is that it would allow extrapolation which may be important in certain use cases.
I have found two implementations of this
LinXGBoost: Is written in purely python and describes itself as an extension to XGBoost, However in the paper it mentions it hasn't been written with performance in mind.
https://github.com/ldv1/LinXGBoost
https://arxiv.org/pdf/1710.03634.pdf
GBDT-PL: Has a python API, think the back end is in C. This performs very well when compared to other gradient boosted decision trees. (at least on the tests/hyperparameters they chose). The paper details many optimisations to make the code run quickly .
https://github.com/GBDT-PL/GBDT-PL
https://arxiv.org/pdf/1802.05640.pdf
An additional optimization I had thought of was you could specify only a subset of the features to fit the linear fit to.
Many thanks
EDIT I fixed the broken links
The text was updated successfully, but these errors were encountered: