-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] Crash in training with early stopping inside sklearn Pipeline with dimentionality reduction #2012
Comments
Hi @mlisovyi ! It's quite expected because scikit-learn's Pipeline is not aware of any validation data you are providing to LightGBM. So, you train on 20 features and validate on 100. It's quite strange that everything is OK with numpy case... I think you can fix this by something like the following:
|
I will have time only on the upcoming weekend to look into it :( |
Sure, no problem! |
Your proposal works, but it has the downside that the transformation for the validation and train samples are different (and also it does not extend to an arbitrary length of the pipeline). I think, the best way to overcome it is to use pipe_trf = Pipeline(pipe.steps[:-1])
pipe_trf = pipe_trf.fit(pd.DataFrame(train_X))
params_fit['lgbm__eval_set'] = [(pipe_trf.transform(pd.DataFrame(train_X)),
pd.DataFrame(train_y)),
(pipe_trf.transform(pd.DataFrame(test_X)),
pd.DataFrame(test_y))]
pipe = pipe.fit(pd.DataFrame(train_X), pd.DataFrame(train_y), **params_fit) This yields the same results if I use The downside is that the transforms have to be fitted twice, which can be inefficient for CPU-intense methods like PCA. I think, there is way to freeze transformers in Note: to be able to get reproducible results, one should fix the |
I've faced an error, when i use a lightgbm model (sklearn API) in a sklearn Pipeline. This happens, only when:
The last two restrictions are illustrated in the example below.
Environment info
Operating System: Ubuntu 18.04
C++/Python/R version: 3.5.5
LightGBM version or commit hash: 2.2.0
Sklearn version: 0.19.1
Error message
Reproducible examples
The text was updated successfully, but these errors were encountered: