-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for multi-target tree. #8616
Conversation
2dcc11a
to
ba0a5b4
Compare
c1cb30e
to
0995252
Compare
0940fe7
to
6ad1e57
Compare
6ad1e57
to
d87affb
Compare
4a63670
to
ec56a0f
Compare
0961133
to
99404c7
Compare
3a2270e
to
d9ea74d
Compare
7306f98
to
b06792c
Compare
baacee0
to
543002c
Compare
0dff6d9
to
b3a2141
Compare
b3a2141
to
fd670a8
Compare
fd670a8
to
64b5187
Compare
7a2b940
to
f4c2a02
Compare
@@ -310,14 +300,8 @@ void PredictBatchByBlockOfRowsKernel(DataView batch, gbm::GBTreeModel const &mod | |||
|
|||
FVecFill(block_size, batch_offset, num_feature, &batch, fvec_offset, p_thread_temp); | |||
// process block of rows through all trees to keep cache locality | |||
if (model.learner_model_param->IsVectorLeaf()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to not rely on the model parameter, which is not serialized into JSON model.
@@ -530,17 +530,17 @@ class TensorView { | |||
/** | |||
* \brief Number of items in the tensor. | |||
*/ | |||
LINALG_HD [[nodiscard]] std::size_t Size() const { return size_; } | |||
[[nodiscard]] LINALG_HD std::size_t Size() const { return size_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clangd is not quite happy about the place of c++ attribute when running in CUDA mode.
@@ -352,19 +352,6 @@ struct WQSummary { | |||
prev_rmax = data[i].rmax; | |||
} | |||
} | |||
// check consistency of the summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused function.
#include "xgboost/objective.h" | ||
#include "xgboost/predictor.h" | ||
#include "xgboost/string_view.h" | ||
#include "xgboost/string_view.h" // for StringView |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the custom string view for now. Some changes in c++20 string_view might be useful, we can back-port it to xgboost when needed.
monitor_->Stop(__func__); | ||
} | ||
|
||
void LeafPartition(RegTree const &tree, linalg::MatrixView<GradientPair const> gpair, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used yet. We need some work on L1 and quantile regression for estimating vector leaf.
@@ -230,6 +236,11 @@ def main(args: argparse.Namespace) -> None: | |||
parser.add_argument("--format", type=int, choices=[0, 1], default=1) | |||
parser.add_argument("--type-check", type=int, choices=[0, 1], default=1) | |||
parser.add_argument("--pylint", type=int, choices=[0, 1], default=1) | |||
parser.add_argument( | |||
"--fix", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new argument for convenience.
@@ -32,6 +32,19 @@ def train_result(param, dmat: xgb.DMatrix, num_rounds: int) -> dict: | |||
return result | |||
|
|||
|
|||
class TestGPUUpdatersMulti: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have extracted all the multi-target/class datasets into an independent hypothesis search strategy. Other than the test for CPU hist, no testing logic is changed.
@@ -352,137 +352,6 @@ def __repr__(self) -> str: | |||
return self.name | |||
|
|||
|
|||
@memory.cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pylint complains about the file being too huge (>1000 loc). I moved some of the data fetchers into testing/data.py
.
Not sure if this is useful, but you can do it just for fun: def alternate(plot_result: bool) -> None:
"""Draw a circle with 2-dim coordinate as target variables."""
from xgboost.callback import TrainingCallback
class ResetStrategy(TrainingCallback):
def before_iteration(self, model, epoch: int, evals_log) -> bool:
strategy = "multi_output_tree" if epoch % 2 == 0 else "one_output_per_tree"
model.set_param({"multi_strategy": strategy})
return False
X, y = gen_circle()
# Train a regressor on it
reg = xgb.XGBRegressor(
tree_method="hist",
n_estimators=4,
n_jobs=1,
max_depth=8,
subsample=0.6,
callbacks=[ResetStrategy()]
)
reg.fit(X, y, eval_set=[(X, y)]) |
82930dc
to
c4948a3
Compare
- Add new hist tree builder. - Move data fetchers for tests. - Dispatch function calls in gbm base on the tree type.
c4948a3
to
abc1f4b
Compare
(Just seeing this for the first time, haven't put much brain power into it yet.) |
Essentially, multi target trees can be used if there are more than one parameter to predict. This can be useful if you want to model all parameters of a univariate and multivariate parametric distribution, see Multi-Target XGBoostLSS Regression |
This is a rough PR for early reviews and discussions, it contains bugs and unfinished code.
I try to reuse as much existing code as possible. For instance, there's no structural change to the histogram builder and the implementation just iterates over a list of builders for each target. However, this might change in the future as we want a more integrated implementation. The evaluation code has to be rewritten for performance. Lastly, there are some optimization techniques for the multi-target tree when the number of targets is huge. Some of the known methods include summarizing the gradient, selecting the gradient, projecting the gradient, optimizing for sparse gradient, etc. I haven't implemented any of those yet, the PR is for the core multi-target tree structure.
There are other cases where we have vector leaf but not multi-target tree grower. For instance, we might want the leaf to be a linear model, or it might contain extra parameters for a probability distribution. These will require new tree training algorithms, but the tree structure is largely the same. The PR is a proof of concept.
The implementation is not as efficient as the single-target one, which doesn't represent the theoretical performance of the strategy.
For small testing datasets, using vector-leaf might lead to significant overfit.
what's working
hist
andgbtree
with most of the tree parameters except for the mono constraint. Numeric feature only.what's not working
everything else.
Related