[python] scikit-learn FutureWarning: 'requires_y' tag #4958

jameslamb · 2022-01-18T04:02:07Z

Description

I've recently noticed the following warning in logs for LightGBM's Python tests.

FutureWarning: As of scikit-learn 0.23, estimators should have a 'requires_y' tag set to the appropriate value. The default value of the tag is False. An error will be raised from version 1.0 when calling check_estimator() if the tag isn't properly set.

This should be addressed in the next release of LightGBM, to ensure that lightgbm remains compatible with newer versions of scikit-learn.

See https://scikit-learn.org/stable/developers/develop.html#estimator-tags for details on that and other estimator tags.

Note that estimator tags were added in scikit-learn 0.21.0 (May 2019). A fix for this issue should attempt to keep lightgbm compatible with that and earlier scikit-learn versions.

Reproducible example / Environment Info

I've observed this on many of LightGBM's Python CI jobs. For example, regular (macOS-latest, Python 3.8) https://github.com/microsoft/LightGBM/runs/4845750605?check_suite_focus=true.

Additional Comments

See #3533 for some related development and conversations.

The text was updated successfully, but these errors were encountered:

StrikerRUS · 2022-01-18T21:41:36Z

All essential estimator tags should be inherited from BaseEstimator and [Regressor|Classifier]Mixin scikit-learn classes.

LightGBM/python-package/lightgbm/compat.py

Line 73 in a06fadf

from sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixin

LightGBM/python-package/lightgbm/compat.py

Lines 95 to 97 in a06fadf

    
           _LGBMModelBase = BaseEstimator 
        
           _LGBMRegressorBase = RegressorMixin 
        
           _LGBMClassifierBase = ClassifierMixin

LightGBM/python-package/lightgbm/sklearn.py

Line 935 in a06fadf

class LGBMRegressor(_LGBMRegressorBase, LGBMModel):

https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/base.py#L345-L346
https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/utils/_tags.py#L3

https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/base.py#L708-L709
https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/base.py#L653-L654

But for some reason in LightGBM they don't. Possibly related: #3192.

ClaudioSalvatoreArcidiacono · 2022-02-22T10:39:20Z

Hi all,

I started to deep dive into this issue today and I have some interesting findings.

First of all, the warning is raised only in this test case:

tests/python_package_test/test_sklearn.py::test_sklearn_integration[LGBMRegressor()-check_requires_y_none]

It is interesting to notice that the same warning is not raised for LGBMClassifier.

I have checked the source code of check_requires_y_none from sklearn and I noticed this part here (source):

    warning_msg = (
        "As of scikit-learn 0.23, estimators should have a "
        "'requires_y' tag set to the appropriate value. "
        "The default value of the tag is False. "
        "An error will be raised from version 1.0 when calling "
        "check_estimator() if the tag isn't properly set."
    )

    expected_err_msgs = (
        "requires y to be passed, but the target y is None",
        "Expected array-like (array or non-string sequence), got None",
        "y should be a 1d array",
    )

    try:
        estimator.fit(X, None)
    except ValueError as ve:
        if not any(msg in str(ve) for msg in expected_err_msgs):
            warnings.warn(warning_msg, FutureWarning)

So, the warning message is raised when the estimator does not fail with the proper error message when y=None is passed as input.

In LGBMRegressor the input is validated with check_X_y, which raises a ValueError with message y cannot be None when y is None.

This does not happen in LGBMClassifier because there the input is checked using the check_classification_targets (see here) function from sklearn.

The behaviour of LGBMRegressor seems completely legit to me. I think that the issue comes from an inconsistency between check_X_y and check_requires_y_none in sklearn.

jameslamb · 2022-02-22T15:10:43Z

@ClaudioSalvatoreArcidiacono thanks so much for your thorough investigation and help with this!

We'll subscribe to the scikit-learn issue you opened and watch it for updates.

StrikerRUS · 2022-02-22T22:20:31Z

@ClaudioSalvatoreArcidiacono Thank you very much for going such lengths in your investigation!
Really appreciate that you're already proposed a fix: scikit-learn/scikit-learn#22578.

ClaudioSalvatoreArcidiacono · 2022-02-23T07:41:21Z

Thank you @jameslamb and @StrikerRUS for checking out my investigation. Thank you for your contributions to this brilliant package! I use it very a lot and I finally got a chance to give back a little. Also happy that by doing so I can improve another amazing package like scikit-learn, that's two birds with one stone ;).

StrikerRUS · 2022-03-28T22:45:40Z

scikit-learn/scikit-learn#22578 was merged.

I just installed the latest available versions of LightGBM and scikit-learn and can confirm that the warning about requires_y tag has gone.

@ClaudioSalvatoreArcidiacono Thanks a lot!

github-actions · 2023-08-23T00:19:07Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb added the maintenance label Jan 18, 2022

ClaudioSalvatoreArcidiacono mentioned this issue Feb 22, 2022

Inconsistent behaviour between check_X_y and check_requires_y_none scikit-learn/scikit-learn#22576

Closed

StrikerRUS closed this as completed Mar 28, 2022

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] scikit-learn FutureWarning: 'requires_y' tag #4958

[python] scikit-learn FutureWarning: 'requires_y' tag #4958

jameslamb commented Jan 18, 2022

StrikerRUS commented Jan 18, 2022

ClaudioSalvatoreArcidiacono commented Feb 22, 2022 •

edited

Loading

jameslamb commented Feb 22, 2022

StrikerRUS commented Feb 22, 2022

ClaudioSalvatoreArcidiacono commented Feb 23, 2022

StrikerRUS commented Mar 28, 2022

github-actions bot commented Aug 23, 2023

[python] scikit-learn FutureWarning: 'requires_y' tag #4958

[python] scikit-learn FutureWarning: 'requires_y' tag #4958

Comments

jameslamb commented Jan 18, 2022

Description

Reproducible example / Environment Info

Additional Comments

StrikerRUS commented Jan 18, 2022

ClaudioSalvatoreArcidiacono commented Feb 22, 2022 • edited Loading

jameslamb commented Feb 22, 2022

StrikerRUS commented Feb 22, 2022

ClaudioSalvatoreArcidiacono commented Feb 23, 2022

StrikerRUS commented Mar 28, 2022

github-actions bot commented Aug 23, 2023

ClaudioSalvatoreArcidiacono commented Feb 22, 2022 •

edited

Loading