-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LGBMRanker query group setting when using gridsearchcv #3018
Comments
ping @StrikerRUS for the #1137 (comment) |
Ah, it's a pity that workaround doesn't work fine anymore. Maybe Generally speaking, scikit-learn doesn't have any (ranking) estimators that allow to pass additional As according to the scikit-learn team plans they are towards moving some old and new parameters into
|
That is a pity. Thanks for the guidance all the time. |
Can we reopen this issue? I'm also encountering the problem where there's no way to input different For example, if the train set of the 1st round of CV has the following group sizes: There is also the possibility of silent errors, where the sum of the group sizes might be the same, while their order differs. E.g. if the group size in the second round of CV was hypothetically There are similar unanswered qns on Stackoverflow too: https://stackoverflow.com/questions/64905119/hyperparameter-optimization-with-lgbmranker |
Please refer to
Deeper
Right now
|
@StrikerRUS would your recommendation be to not use the scikit-learn wrapper for LGBMRanker then? At the very least, it would be good for such limitations to be made known in the user docs, rather than have folks use the scikit-learn integration thinking that it works. |
No.
Good idea! I'll make a note about that scikit-learn doesn't support ranking, therefore there is no integration for this class with different sklearn tools. |
Thanks @StrikerRUS! I’m sure the docs change would be much appreciated to newcomers like me 🙂 |
@lowjiajin Done in #4243. Thanks for the proposal! |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
#1137
Environment info
Operating System: Mac OS 10.15
CPU/GPU model: no
C++/Python/R version: python 3.7
LightGBM version or commit hash:
Error message
lightgbm.basic.LightGBMError: Sum of query counts is not same with #data
Reproducible examples
estimator_params = {'boosting_type': 'gbdt',
'objective': 'lambdarank',
'min_child_samples': 5,
'importance_type': 'gain',
}
gbm = lgb.LGBMRanker(**estimator_params)
params_grid = {'n_estimators': [10, 20],
'num_leaves': [10, 20],
'max_depth': [10],
'learning_rate': [0.1],
}
cv_group_info = query_train.astype(int)
flatted_group = np.repeat(range(len(cv_group_info)), repeats=cv_group_info)
logo = LeaveOneGroupOut()
cv = logo.split(X_train, y_train, groups=flatted_group)
cv_group = logo.split(X_train, groups=flatted_group)
grid = GridSearchCV(gbm, params_grid, cv=cv, verbose=2,
scoring=make_scorer(ndcg_score, greater_is_better=True), refit=False)
def group_gen(flatted_group, cv):
for train, test in cv:
yield np.unique(flatted_group[train], return_counts=True)[1]
gen = group_gen(flatted_group, cv_group)
params_fit = {
'eval_set': [(X_test, y_test)],
'eval_group': [query_test],
'eval_metric': 'ndcg',
'early_stopping_rounds': 100,
'eval_at': [1, 2, 3],
}
grid.fit(X_train, y_train, group= next(gen), **params_fit)
Steps to reproduce
The text was updated successfully, but these errors were encountered: