Skip to content

[FEATURE] get_feature_names_out for sklego.preprocessing transformers. #543

@CarloLepelaars

Description

@CarloLepelaars

get_feature_names_out is an important component for interpreting scikit-learn Pipeline objects. A get_feature_names_out call on a Pipeline only works if it is implemented for all components in the pipeline, except the last step (i.e. the Model).

Scikit-learn recently implemented get_feature_names_out for all Transformers in their 1.1 release (Source).

I think it makes sense to also implement get_feature_names_out for all scikit-lego Transformers that are not models and are not TrainOnly. This leaves most objects in sklego.preprocessing.

  • sklego.preprocessing.ColumnCapper
  • sklego.preprocessing.DictMapper
  • sklego.preprocessing.IdentityTransformer
  • sklego.preprocessing.IntervalEncoder
  • sklego.preprocessing.OutlierRemover (TrainOnly)
  • sklego.preprocessing.PandasTypeSelector
  • sklego.preprocessing.ColumnSelector
  • sklego.preprocessing.ColumnDropper
  • sklego.preprocessing.PatsyTransformer
  • sklego.preprocessing.OrthogonalTransformer
  • sklego.preprocessing.InformationFilter
  • sklego.preprocessing.RandomAdder (TrainOnly)
  • sklego.preprocessing.RepeatingBasisFunction

Additionally, it should be tested if get_feature_names_out works correctly with a Pipeline that contains transformers inheriting from TrainOnlyTransformerMixin, like RandomAdder.

@koaning and I recently discussed implementing get_feature_names_out for sklego.meta and ended up implementing this method for EstimatorTransformer (PR #539). It does not look like objects in sklego.decomposition and sklego.mixture require an implementation of get_feature_names_out, because it seems they are mostly used as the last step in a pipeline or wrapped in an EstimatorTransformer.

Since this is such a systematic issue, we can consider adding some additional requirements for people contributing to sklego.preprocessing. That is, make sure to implement get_feature_names_out for any new preprocessor that is not a train-time only Transformer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions