Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] scikit-learn FutureWarning: 'requires_y' tag #4958

Closed
jameslamb opened this issue Jan 18, 2022 · 7 comments
Closed

[python] scikit-learn FutureWarning: 'requires_y' tag #4958

jameslamb opened this issue Jan 18, 2022 · 7 comments

Comments

@jameslamb
Copy link
Collaborator

Description

I've recently noticed the following warning in logs for LightGBM's Python tests.

FutureWarning: As of scikit-learn 0.23, estimators should have a 'requires_y' tag set to the appropriate value. The default value of the tag is False. An error will be raised from version 1.0 when calling check_estimator() if the tag isn't properly set.

This should be addressed in the next release of LightGBM, to ensure that lightgbm remains compatible with newer versions of scikit-learn.

See https://scikit-learn.org/stable/developers/develop.html#estimator-tags for details on that and other estimator tags.

Note that estimator tags were added in scikit-learn 0.21.0 (May 2019). A fix for this issue should attempt to keep lightgbm compatible with that and earlier scikit-learn versions.

Reproducible example / Environment Info

I've observed this on many of LightGBM's Python CI jobs. For example, regular (macOS-latest, Python 3.8) https://github.com/microsoft/LightGBM/runs/4845750605?check_suite_focus=true.

Additional Comments

See #3533 for some related development and conversations.

@StrikerRUS
Copy link
Collaborator

All essential estimator tags should be inherited from BaseEstimator and [Regressor|Classifier]Mixin scikit-learn classes.

from sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixin

_LGBMModelBase = BaseEstimator
_LGBMRegressorBase = RegressorMixin
_LGBMClassifierBase = ClassifierMixin

class LGBMRegressor(_LGBMRegressorBase, LGBMModel):

https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/base.py#L345-L346
https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/utils/_tags.py#L3

https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/base.py#L708-L709
https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/base.py#L653-L654

But for some reason in LightGBM they don't. Possibly related: #3192.

@ClaudioSalvatoreArcidiacono
Copy link
Contributor

ClaudioSalvatoreArcidiacono commented Feb 22, 2022

Hi all,

I started to deep dive into this issue today and I have some interesting findings.

First of all, the warning is raised only in this test case:

tests/python_package_test/test_sklearn.py::test_sklearn_integration[LGBMRegressor()-check_requires_y_none]

It is interesting to notice that the same warning is not raised for LGBMClassifier.

I have checked the source code of check_requires_y_none from sklearn and I noticed this part here (source):

    warning_msg = (
        "As of scikit-learn 0.23, estimators should have a "
        "'requires_y' tag set to the appropriate value. "
        "The default value of the tag is False. "
        "An error will be raised from version 1.0 when calling "
        "check_estimator() if the tag isn't properly set."
    )

    expected_err_msgs = (
        "requires y to be passed, but the target y is None",
        "Expected array-like (array or non-string sequence), got None",
        "y should be a 1d array",
    )

    try:
        estimator.fit(X, None)
    except ValueError as ve:
        if not any(msg in str(ve) for msg in expected_err_msgs):
            warnings.warn(warning_msg, FutureWarning)

So, the warning message is raised when the estimator does not fail with the proper error message when y=None is passed as input.

In LGBMRegressor the input is validated with check_X_y, which raises a ValueError with message y cannot be None when y is None.

This does not happen in LGBMClassifier because there the input is checked using the check_classification_targets (see here) function from sklearn.

The behaviour of LGBMRegressor seems completely legit to me. I think that the issue comes from an inconsistency between check_X_y and check_requires_y_none in sklearn.

@jameslamb
Copy link
Collaborator Author

@ClaudioSalvatoreArcidiacono thanks so much for your thorough investigation and help with this!

We'll subscribe to the scikit-learn issue you opened and watch it for updates.

@StrikerRUS
Copy link
Collaborator

@ClaudioSalvatoreArcidiacono Thank you very much for going such lengths in your investigation!
Really appreciate that you're already proposed a fix: scikit-learn/scikit-learn#22578.

@ClaudioSalvatoreArcidiacono
Copy link
Contributor

Thank you @jameslamb and @StrikerRUS for checking out my investigation. Thank you for your contributions to this brilliant package! I use it very a lot and I finally got a chance to give back a little. Also happy that by doing so I can improve another amazing package like scikit-learn, that's two birds with one stone ;).

@StrikerRUS
Copy link
Collaborator

scikit-learn/scikit-learn#22578 was merged.

I just installed the latest available versions of LightGBM and scikit-learn and can confirm that the warning about requires_y tag has gone.

@ClaudioSalvatoreArcidiacono Thanks a lot!

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants