Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Revising ranking objectives. #6450

Closed
trivialfis opened this issue Nov 30, 2020 · 1 comment
Closed

[RFC] Revising ranking objectives. #6450

trivialfis opened this issue Nov 30, 2020 · 1 comment
Assignees
Labels
LTR Learning to rank status: RFC

Comments

@trivialfis
Copy link
Member

trivialfis commented Nov 30, 2020

XGBoost supports different ranking objectives based on LambdaMART, including rank:pairwise, rank:ndcg and rank:map. The rank:pairwise is unscaled version of ranknet's cost, which means the \delta term in LambdaMART is just set to constant . Other 2 are extensions of LambdaMART with different measures. To clarify the ranking model implementation and make room for other objectives, I propose the following changes:

Rename the objective functions

Use lambdamark:extension for existing objectives. All 3 objectives are pairwise approach as lambdarank/lambdamart is pairwise LTR approach. The name rank:pairwise rank:ndcg is just confusing. Also with recent advancement in LTR, there are revised lambdamart algorithms like the one mentioned in #6143 , so we should make room for future algorithm additions.

  • rank:pairwise -> lambdamart:constant.
  • rank:ndcg -> lambdamart:ndcg
  • rank:map -> lambdamart:map

Clarify and test the input and output space.

The input space and output space for pairwise objectives should be feature values and relevance degree. Internally xgboost constructs pairs of model scores to compute lambda gradients. For map, the label should be binary values as map doesn't work on multi-degree relevance. There's an extension to map called gap that supports multi-degree relevance, but it's out of the scope of this issue. For ndcg, the label should be strictly integers as the ndcg gain in xgboost is calculated as 2^rel_i -1. Another method is use user input gain in place of this one.

More features to existing objectives

We should add class balancing to rank:ndcg to combat imbalanced target relevance degree. Also, truncation should be explicitly specified to rank:ndcg. Lastly, we may support the second ndcg gain type mentioned above.

Change default evaluation metrics

The default evaluation metric for rank:ndcg should be changed to ndcg instead of the current map. With the truncation mentioned in above, the default metric should be ndcg@truncation.

Change implementation details

Right now the implementation deviates from standard algorithm described in related literature. I want to revise the implementation to get the code closer to descriptions in well known literature. I have a prototype of revised rank:ndcg running on CPU, which has better accuracy than existing one even when truncation is set to 1. Also, there are some optimizations we can perform like caching IDCG values for each query group.

@trivialfis
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LTR Learning to rank status: RFC
Projects
None yet
Development

No branches or pull requests

1 participant