-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added code for instance based weighing for rank objectives #3379
added code for instance based weighing for rank objectives #3379
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3379 +/- ##
=========================================
Coverage 44.99% 44.99%
Complexity 228 228
=========================================
Files 166 166
Lines 12787 12787
Branches 466 466
=========================================
Hits 5754 5754
Misses 6841 6841
Partials 192 192
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngoyal2707 Thank you for submitting this PR. I understand that the ranking objective should take account of instance weights.
A few things:
- I think you should include weight normalization in the ranking objective code, since the user-defined instance weights are usually not expected to sum to
N
. - Can you add some tests? You said
Used python wrapper to train on weighted loss and evaluate on weighted loss. Performs much better than just training on unweighted loss.
and I think you are right. Can you craft an example data to demonstrate it? For instance, you can set zero weights for a few instances and see if these instances affect the loss.
Ideally, you should add two tests: 1) one showing that the weighted ranking objective is correct; and 2) another containing an edge case where the old implementation leads to non-sensical results.
Let me know if you need help writing a test. I'd be more than happy to assist.
e00ddec
to
95c6292
Compare
@hcho3 Thanks for looking into the PR, as per your comments I have added changes with:
Questions:
Side Note:
|
95c6292
to
b3ee94f
Compare
@hcho3 your changes of doing weight normalization looks good to me. Can you please merge them to master now? |
032912d
to
93a4937
Compare
@ngoyal2707 I ended up removing my latest change. The normalization factor would need to be re-computed whenever the objective function object is re-used for a different dataset, and I could not find a good way to detect the change of dataset. For now, I think we should just compute the normalization factor to be on the safe side. |
@hcho3 I see, good point, sorry I didn't catch that. |
@ngoyal2707 Yes, I will merge it once all tests pass. |
One dumb question @ngoyal2707, can you clarify the structure of the |
Hey @ejalonas , In my implementation If you want individual instance based weight, you can use |
Thanks @ngoyal2707 ! I really appreciate your clarification and contribution to this repo. |
* added code for instance based weighing for rank objectives * Fix lint
@ngoyal2707 which release version will begin to support the group weights? |
This PR uses sample_weights per list instance for all the ranking objectives. This is somewhat important for some use cases where in production systems we want to give more weights for some sample instance compared to other. (In this case, sample instance is query entire document set).
Testing Done:
usage:
train_data_xgboost.set_weight(train_pairs_weight)
The weights need to be normalized externally as per current implementation as following:
which is essentially doing:
w_i = w_i*N / (sum (w_i))
If you think I should do the normalization inside the source code, let me know, I can easily change the PR with that.
EDIT:
This PR should resolve both of following issues:
#2460
#2561