-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite Dask interface. #4819
Rewrite Dask interface. #4819
Conversation
@RAMitchell @hcho3 @CodingCat I can further abstract |
I would like to see how many works are in common with #4656 . @thesuperzapper Could you also join the review after this PR no longer being WIP? |
c5dc6df
to
08ca328
Compare
No rush on this I think. Lets stabilise the dask interface first. Whats your plan for assigning gpu_id in this interface? |
@RAMitchell I have another branch that uses a utility I implemented. I need to somehow merge it here. |
b46bb59
to
1688432
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate the client from the current Python process by enabling a client override. Remove experimental new Python syntax. Use Dask's method for frame concatenation instead of dispatching the type yourself.
@RAMitchell asked me for my thoughts on the use of the Strictly from a Dask user's perspective I think that it's nice to have first class support for Dask within XGBoost. However, I can see how from an XGBoost maintainer's point of view, this might be concerning. In general I think that API extensibility is good. For example it's nice that I can use That's all maybe a bit philosophical. I'm happy to get more into details here if folks want. |
I'll also say, it would be nice to find protocols that would allow us to pass in Dask dataframes or Dask arrays directly, without creating the |
The actual implementation looks like what we've all done historically. It's not super clean but there doesn't seem to be a better solution at the moment. I think it would be worth thinking about how to improve these sorts of workflows within Dask. My guess is that that's out of scope for folks here (some core Dask devs probably need to spend some serious time thinking about that). Mostly I mention this so that you don't grow too attached to this implementation, and remain open to some day replacing it with something else. |
Yeah. I learned a lot from it. Especially the part of obtaining worker local data.
I'm interested in getting to know more about dask. So please keep me in the loop.
That won't be a problem. If there's an improvement we can change it anytime. I don't expect one commit can get everything right. ;-) As for sklearn interface. I'm working on it. |
61b4051
to
551696a
Compare
* Consider data locality in distributing data. * New interface that looks similar to the one with single node. * Pass explicit client object. * Support evaluation history.
fc26061
to
d029b9f
Compare
Codecov Report
@@ Coverage Diff @@
## master #4819 +/- ##
==========================================
- Coverage 77.63% 71.72% -5.92%
==========================================
Files 11 11
Lines 2039 2281 +242
==========================================
+ Hits 1583 1636 +53
- Misses 456 645 +189
Continue to review full report at Codecov.
|
@mrocklin @RAMitchell @mt-jones Please take a look again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. You will have to make sure all of these functions appear in the Python API documentation.
@mrocklin @TomAugspurger The original implementation of dask-xgboost is acknowledged in both doc and code header. Thank you so much! |
@CodingCat @hcho3 Would you like to take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me.
Merging as we will have more time to test it. @hcho3 @CodingCat Try it out, I have grown to like dask now. ;-) |
@mt-jones @mrocklin @RAMitchell Thanks! |
This PR rewrites the dask interface based on
dask-xgboost
, with added support for evaluation metrics and slightly different interface.Closes #4814.