[python-package] Support `feature_names_in_` attribute via sklearn API #6279

ravwojdyla · 2024-01-18T15:55:57Z

Summary

sklearn API supports feature_names_in_ attribute on a fitted model (SLEP007), which remembers the feature names/columns that went into the model.fit method. This can be very useful information, and is a standard worth conforming to. Afaiu right now that information is available in the booster:

est.booster_.feature_name()

It shouldn't be too hard to conform to also expose that information via feature_names_in_ attribute 🙏

Motivation

It would conform to the sklearn API standards, improve usability of LightGBM models, especially when used along with other sklearn models and Pipelines.

References

https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep007/proposal.html

The text was updated successfully, but these errors were encountered:

jameslamb · 2024-01-22T06:00:11Z

Thanks for using LightGBM and taking the time to report this!

We'd welcome this addition, would you like to contribute it?

And a side question.... do you think it's an oversight that scikit-learn's estimator checks don't enforce this? We follow https://scikit-learn.org/stable/modules/generated/sklearn.utils.estimator_checks.check_estimator.html in LightGBM's tests to try to catch such things

LightGBM/tests/python_package_test/test_sklearn.py

Lines 1285 to 1288 in 255c93b

    
           @parametrize_with_checks([lgb.LGBMClassifier(), lgb.LGBMRegressor()]) 
        
           def test_sklearn_integration(estimator, check): 
        
               estimator.set_params(min_child_samples=1, min_data_in_bin=1) 
        
               check(estimator)

Using scikit-learn==1.3.2 (the latest released version as of this writing), check_estimator() says LGBMClassifier and LGBMRegressor are compliant with scikit-learn's expectations for estimators.

import lightgbm as lgb
from sklearn.utils.estimator_checks import check_estimator

check_estimator(lgb.LGBMClassifier())
check_estimator(lgb.LGBMRegressor())

But in the SLEP you linked, it says the following:

Backward Compatibility
All estimators should implement the feature_names_in_ and get_feature_names_out() API. This is checked in check_estimator...

nicklamiller · 2024-01-22T17:49:16Z

I would very much like to contribute to LightGBM and this seems like a great issue, with @ravwojdyla's blessing, I'd be happy to make this contribution.

ravwojdyla · 2024-01-22T17:56:52Z

@nicklamiller sounds great - thank you!

jameslamb · 2024-01-23T05:55:21Z

Do either of you know the answer to my question about check_estimator() from the latest scikit-learn not complaining about this?

nicklamiller · 2024-01-26T06:32:02Z

Backward Compatibility
All estimators should implement the feature_names_in_ and get_feature_names_out() API. This is checked in check_estimator...

@jameslamb I agree that based on SLEP007, this functionality should be implemented in check_estimator and does not appear to be. Here's a somewhat recent issue of sklearn estimators that lack(ed) this attribute, it looks like the attribute it is only checked/created if missing when _validate_data is called.

I can open an issue in sklearn and propose this behavior is more rigorously checked with check_estimator.

jameslamb · 2024-01-28T03:47:38Z

Thanks very much for the link to scikit-learn/scikit-learn#27907 @nicklamiller !

Please link to this issue from whatever one you create in scikit-learn.

jameslamb changed the title ~~Support feature_names_in_ attribute via sklearn API~~ [python-package] Support feature_names_in_ attribute via sklearn API Jan 19, 2024

jameslamb added the feature request label Jan 22, 2024

jameslamb mentioned this issue Jan 22, 2024

Feature Requests & Voting Hub #2302

Open

nicklamiller mentioned this issue Feb 1, 2024

Enforce feature_names_in_ and n_features_in_ in check_estimator post SLEP007 implementation scikit-learn/scikit-learn#28337

Open

nicklamiller mentioned this issue Feb 12, 2024

[python-package] Add feature_names_in_ attribute for scikit-learn estimators (fixes #6279) #6310

Merged

jameslamb closed this as completed in #6310 Jul 3, 2024

jameslamb closed this as completed in f811c82 Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] Support `feature_names_in_` attribute via sklearn API #6279

[python-package] Support `feature_names_in_` attribute via sklearn API #6279

ravwojdyla commented Jan 18, 2024

jameslamb commented Jan 22, 2024

nicklamiller commented Jan 22, 2024

ravwojdyla commented Jan 22, 2024

jameslamb commented Jan 23, 2024

nicklamiller commented Jan 26, 2024 •

edited

Loading

jameslamb commented Jan 28, 2024

[python-package] Support feature_names_in_ attribute via sklearn API #6279

[python-package] Support feature_names_in_ attribute via sklearn API #6279

Comments

ravwojdyla commented Jan 18, 2024

Summary

Motivation

References

jameslamb commented Jan 22, 2024

nicklamiller commented Jan 22, 2024

ravwojdyla commented Jan 22, 2024

jameslamb commented Jan 23, 2024

nicklamiller commented Jan 26, 2024 • edited Loading

jameslamb commented Jan 28, 2024

[python-package] Support `feature_names_in_` attribute via sklearn API #6279

[python-package] Support `feature_names_in_` attribute via sklearn API #6279

nicklamiller commented Jan 26, 2024 •

edited

Loading