This is a python implementation of the AdaRank algorithm (Xu and Li, 2007) with early stopping.
The structure of the code follows closely to the scikit-learn style, but still there are some
differences in the estimator/metrics API (e.g. fit()
method takes three arguments X
, y
,
and qid
rather than just two).
Four ranking metrics are implemented: P@k, AP, DCG@k, and nDCG@k
(in both trec_eval
and Burges et al. versions).
numpy
scikit-learn
The following code will run AdaRank for 100 iterations optimizing for NDCG@10. When no improvement is made within the previous 10 iterations, the algorithm will stop.
from adarank import AdaRank
from metrics import NDCGScorer
scorer = NDCGScorer(k=10)
model = AdaRank(max_iter=100, estop=10, scorer=scorer).fit(X, y, qid)
pred = model.predict(X_test, qid_test)
print scorer(y_test, pred, qid_test).mean()
See test.py for more advanced examples.
Burges et al. Learning to rank using gradient descent. In Proceedings of ICML '05, pages 89–96. ACM, 2005.
Xu and Li. AdaRank: a boosting algorithm for information retrieval. In Proceedings of SIGIR '07, pages 391–398. ACM, 2007.