Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] [dask] Add DaskLGBMRanker #3708
[python-package] [dask] Add DaskLGBMRanker #3708
Changes from 4 commits
b39a161
19dbce6
42149a0
607c857
a33a4de
78358e2
d6e5209
1d6054b
2321692
bb70d11
d472c82
cad5403
d9efb5b
d1d4abf
02aabda
56bb2c9
fc985bf
8c33f56
872c578
53988ec
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb What about
init_score
? Is it supported or we should add feature request for it?LightGBM/python-package/lightgbm/sklearn.py
Lines 396 to 397 in 1c60cfa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have a feature request. I'll write it up and add a link here.
@ffineis, could you add
init_score=None
here betweensample_weight
andgroup
, so the order matches the sklearn interface forLGBMRanker
? (LightGBM/python-package/lightgbm/sklearn.py
Line 962 in 1c60cfa
fit()
, they won't accidentally have theirinit_score
interpreted asgroup
.And can you just then add a check like this?
@StrikerRUS speaking of positional arguments, I'll open another issue where we can discuss how to handle the
client
argument. But let's leave that out of this PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb
Yes, sure! Agree with all your intents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init_score: #3807
client placement: #3808
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, no prob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was it necessary to replace the dask-ml one with this function?
If that's just to cut out our testing dependency on
dask-ml
(which would be much appreciated!), could you make that a separate PR with this change plus removingdask-ml
from here:LightGBM/.ci/test.sh
Line 71 in f2695da
I think that PR would be small and non-controversial, so we could review and merge it quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, yeah I'll follow up 3708 with a PR to remove of dask-ml as a test dependency. See reply to comment re: accuracy_score (above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in d6e5209
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @ffineis is it cool if I pick up the "remove testing dependency on
dask-ml
" thing?If you want to make the contribution I'll totally respect that, but if it's just extra work you'd rather not do, I'm happy to make a PR right now for it. Wanna be respectful of your time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just made some changes in #3786 to speed up the tests in this module. If it'll work for the tests you're adding in this PR, can you please make every
DaskLGBMRanker
andLGBMRanker
usen_iterations=10, num_leaves=10
? I found that training smaller models was able to make the tests faster without sacrificing any coverage.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb I like this test very much! Can we add the same for other estimators? Seems that Classifier tests do not asserts real Dask
predict()
equals to.to_local().predict()
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the classifier tests do check that
predict()
and non-Daskpredict()
:LightGBM/tests/python_package_test/test_dask.py
Line 94 in 1c60cfa
predict()
and.to_local().predict()
:LightGBM/tests/python_package_test/test_dask.py
Line 138 in 1c60cfa
I'd be open to a PR that makes it more explicit and cuts out the total number of tests by combining
test_classifier_local_predict
(LightGBM/tests/python_package_test/test_dask.py
Line 122 in 1c60cfa
test_classifier
(LightGBM/tests/python_package_test/test_dask.py
Line 72 in 1c60cfa
Not an action we need on this PR, I'll write it up as a "good first issue".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb
Actually this checks
to_local().predict(dX)
and localpredict()
. It is not the same asto_local().predict(dX)
and Daskpredict()
, given that asserts you pointing are placed in two different (in theory unrelated) tests, I think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's confusing because it's two tests, so I think we should merge the tests in the future like I said above (not anything you need to do on here, @ffineis).
I'm saying that taken together, they show transitively that
dask_model.predict() == dask_model.to_local().predict()
, even though neither one does that direct comparison.dask_model.predict() == sklearn_model.predict()
dask_model.to_local().predict() == sklearn_model.predict()
But that's only partially true, since
test_local_predict
only does binary classification, and only on array data. So combining them would give more coverage and would show whether that result holds up for multi-class classification or for other input data types.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree!
Also agree. But two different tests can take different inputs (params, data), can be modified in future separately, etc. So it can be not fully fair comparison. Merging them into one test will solve all issues. 🙂