-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Roadmap] 1.4.0 Roadmap #6500
Comments
I want to propose to replace 'approx' to 'hist when tree_method is set to 'auto' on CPU. I observed #5178 was about this, but it's closed.
Do we have any concerns not to do this? CC: @trivialfis, @hcho3, @ShvetsKS |
I will put up some documents on theoretical aspect of various tree methods, then we can decide together. |
in #6564 |
@trivialfis, do we need to run experiments to decide, probably? |
@SmirnovEgorRu I don't have objection of changing the default in this or next release. I mentioned there's a huge refactor for CPU implementations to @ShvetsKS . I would like to see some parts of it merged before making the change so we can make some fair comparison. Will come back to it after sorting out issues in dask interface. (which should be quite fast as most of the features are now supported). |
I closed the PR you referenced because I couldn't get all tests passing, I think even if we decided to change the default now, we still have some blockers to track down. So refactoring first might help making the change clearer and easier. |
Hi @ShvetsKS @SmirnovEgorRu I have been trying to refactor the CPU code for categorical data support based on the efficient CPU Hist code. I found that on URL dataset the cpu hist is slower than approx. It's not a conventional dataset as it's unusually wide and sparse. Just curious if you have plan on optimizing it. |
The approx implementation is parallelizing on features with dynamic scheduling, so it has an advantage on these kind of datasets. |
@trivialfis, yep, we are thinking how to tune wide data sets as well. I suppose we can outperform |
"Support training multiple models in parallel using dask". Does this include cross-validation with early stopping? |
@Denisevi4 No, it's for running multiple training sessions on a single cluster simultaneously. But it's a basic requirement for cv. |
@hcho3 I would like to get the 1.4 out once we got AUC re-implemented. I can try fixing the gamma metric if the AUC re-implementation goes well. |
I will branch out next week. |
Will you fix other metrics (gamma-nloglik, logloss)? |
Yeah, I will take a deeper look into them this weekend. |
1.4 is out, submit status will be on #6793 . |
@dmlc/xgboost-committer Please add your items here by editing this post. Let's ensure that
For other contributors who have no permission to edit the post, please comment here about what you think should be in 1.4.0.
Main
use_rmm
flag to global configuration (Add use_rmm flag to global configuration #6656)ntree_limit
in Python.inplace_predict
for sklearn. (Use inplace predict for sklearn. #6718)gbtree
thread safe. (Make prediction thread safe. #6648)Dask
distributed.MultiLock
#6743)For brief notes, at 1.4, dask interface should be feature complete, categorical data support for GPU is ready for public testing and inplace prediction will be more mature.
The text was updated successfully, but these errors were encountered: