-
Notifications
You must be signed in to change notification settings - Fork 121
Description
get_feature_names_out
is an important component for interpreting scikit-learn Pipeline
objects. A get_feature_names_out
call on a Pipeline
only works if it is implemented for all components in the pipeline, except the last step (i.e. the Model).
Scikit-learn recently implemented get_feature_names_out
for all Transformers in their 1.1 release (Source).
I think it makes sense to also implement get_feature_names_out
for all scikit-lego
Transformers that are not models and are not TrainOnly
. This leaves most objects in sklego.preprocessing
.
-
sklego.preprocessing.ColumnCapper
-
sklego.preprocessing.DictMapper
-
sklego.preprocessing.IdentityTransformer
-
sklego.preprocessing.IntervalEncoder
-
sklego.preprocessing.OutlierRemover
(TrainOnly) -
sklego.preprocessing.PandasTypeSelector
-
sklego.preprocessing.ColumnSelector
-
sklego.preprocessing.ColumnDropper
-
sklego.preprocessing.PatsyTransformer
-
sklego.preprocessing.OrthogonalTransformer
-
sklego.preprocessing.InformationFilter
-
sklego.preprocessing.RandomAdder
(TrainOnly) -
sklego.preprocessing.RepeatingBasisFunction
Additionally, it should be tested if get_feature_names_out
works correctly with a Pipeline
that contains transformers inheriting from TrainOnlyTransformerMixin
, like RandomAdder
.
@koaning and I recently discussed implementing get_feature_names_out
for sklego.meta
and ended up implementing this method for EstimatorTransformer
(PR #539). It does not look like objects in sklego.decomposition
and sklego.mixture
require an implementation of get_feature_names_out
, because it seems they are mostly used as the last step in a pipeline or wrapped in an EstimatorTransformer
.
Since this is such a systematic issue, we can consider adding some additional requirements for people contributing to sklego.preprocessing
. That is, make sure to implement get_feature_names_out
for any new preprocessor that is not a train-time only Transformer.