feat: `RepeatingBasisFunction.inverse_transform` #687

Alex-Cremers · 2024-07-11T12:48:59Z

Description

This PR implements get_feature_names_out() for _RepeatingBasisFunction, and RepeatingBasisFunction (which inherits feature names from the former).

This PR also implements an inverse_transform for _RepeatingBasisFunction in passing. I did not include a more general implementation for RepeatingBasisFunction (as the one I use requires importing pandas), but the inverse_transform() can be accessed from the fitted transformer via .pipeline_.named_transformers_['repeatingbasis'].inverse_transform()). It's a rare use case, but it shouldn't affect other uses in any way, so I figured I'd include it. Note that the transformation is only invertible if the original values are in the input_range (upper bound excluded). Otherwise, the reconstructed values are only equal modulo the width of the range.

Two tests have been added to test_repeatingbasisfunction.py:

test that set_output(transform='pandas') works properly with RepeatingBasisFunction
test that the new inverse_transform() for _RepeatingBasisFunction truly recovers the original values (as long as they fall within the input_range).

Note that for feature names, ClassNamePrefixFeaturesOutMixin could also be used if self.n_periods was renamed to self._n_features_out, but I didn't see a simple way of keeping the original column name as prefix, so I adopted a solution perhaps more idiosyncratic.

It's a minor PR overall, so I didn't ping before submitting. I hope it's okay.

Progress on Issue #543

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

My code follows the style guidelines (ruff)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (also to the readme.md)
I have added tests that prove my fix is effective or that my feature works
I have added tests to check whether the new feature adheres to the sklearn convention
New and existing unit tests pass locally with my changes

If you feel your PR is ready for a review, ping @FBruzzesi or @koaning.

…verse_transform() in the underlying _RepeatingBasisFunction, which can be accessed through self.pipeline_.named_transformers_['repeatingbasis'].inverse_transform() is needed (rare use case for sure).

…tting output format to pandas work, a second to make sure that the inverse_transform() truly returns the original column (as long as original are within the input_range).

Alex-Cremers · 2024-07-11T12:54:02Z

Requesting review and approval for the tests, @koaning

koaning · 2024-07-11T15:05:12Z

I am open to the inverse transform, but I think the implementation may change in the future in favor our scikit-learn pipelines. This transformer was added before the SplineTransformer was added to core scikit-learn and I might recommend folks to use that going forward. The extrapolation="periodic" setting basically covers the use-case here. We might just refer to that under the hood.

@FBruzzesi figured I'd check with you, do you have any comments/opinions on this one? I am open to adding it, just wondering what might be most pragmatic given the sklearn estimator that now exists.

Alex-Cremers · 2024-07-11T15:14:23Z

Thanks, I had missed the possibility to set SplineTransformer to periodic! The result is pretty similar indeed.

Now I see that sklearn also has an open issue regarding inverse transform on Spline transformer: scikit-learn/scikit-learn#28551

FBruzzesi · 2024-07-11T15:33:55Z

Hey @Alex-Cremers , thanks for the PR, it is much appreciated!

I can take a close look during the weekend, from a sneakpeak I would say that some linting is required. Every other technical comment will follow in the weekend 😂

@koaning

just wondering what might be most pragmatic given the sklearn estimator that now exists.

I am generally in favor to keep what we have bug free but not expanding (and maintaining) new features if they are already in scikit-learn - which seems not to be the case for inverse_transform , therefore I am open to have it implemented here

FBruzzesi

Thanks again for the PR. I think it is a nice feature to have in here!

As first comment, please run the following to format and lint the code:

python -m pip install ruff 
make lint

I think I am missing something on why RepeatingBasisFunction re-implements some logic in the inverse_transform and does not just call

self.pipeline_.named_transformers_['repeatingbasis'].inverse_transform(X)

I feel like we are separating the logic while there is no real need for it - in fact they return different shape objects and I would not be expecting that.

FBruzzesi · 2024-07-13T09:40:28Z

sklego/preprocessing/repeatingbasis.py

+    def inverse_transform(self, X):
+        """Transform RBF features back to the input range. Outputs a numpy array. 
+


Can we add what you mention in the description in the docstring Notes of the method?

In particular I am referring to:

Note that the transformation is only invertible if the original values are in the input_range (upper bound excluded). Otherwise, the reconstructed values are only equal modulo the width of the range.

FBruzzesi · 2024-07-13T09:51:40Z

sklego/preprocessing/repeatingbasis.py

+        check_is_fitted(self, ["pipeline_"])
+
+        if isinstance(X,np.ndarray):
+          Xarr = check_array(X[:,:self.n_periods], estimator=self, ensure_2d=True)


I would be more comfortable with something along the following lines:

If line 144 becomes

Xarr = check_array(X, estimator=self, ensure_2d=True, ensure_min_features=self.n_periods)[:, :self.n_periods]

which also convert to array (some) dataframe-like objects and maybe it's even possible to avoid checking for array instance.

FBruzzesi · 2024-07-13T09:53:57Z

tests/test_preprocessing/test_repeatingbasisfunction.py

+    Z = tf.fit(X, y).transform(X)
+    assert np.allclose(
+      X["a"], 
+      tf.pipeline_.named_transformers_['repeatingbasis'].inverse_transform(Z), 


This feels a bit too nested to me.

Why don't we

Suggested change

tf.pipeline_.named_transformers_['repeatingbasis'].inverse_transform(Z),

tf.inverse_transform(Z),

?

They should suppose to return the same or am I missing something?

FBruzzesi · 2024-07-13T09:54:28Z

sklego/preprocessing/repeatingbasis.py

+        if isinstance(X,np.ndarray):
+          Xarr = check_array(X[:,:self.n_periods], estimator=self, ensure_2d=True)
+          new_x = self.pipeline_.named_transformers_['repeatingbasis'].inverse_transform(Xarr)
+          Xarr = np.hstack((new_x.reshape(-1, 1),X[:,self.n_periods:]))


Why are we returning the input as well? kind of related to the comment in tests

FBruzzesi · 2024-07-13T09:54:42Z

sklego/preprocessing/repeatingbasis.py

@@ -208,7 +243,27 @@ def transform(self, X):

        # apply rbf function to series for each basis
        return self._rbf(base_distances)
-
+
+    def inverse_transform(self, X):


This is awesome ✨

Alex-Cremers · 2024-07-13T18:59:23Z

Thanks again for the PR. I think it is a nice feature to have in here!

As first comment, please run the following to format and lint the code:
python -m pip install ruff 
make lint
I think I am missing something on why RepeatingBasisFunction re-implements some logic in the inverse_transform and does not just call
self.pipeline_.named_transformers_['repeatingbasis'].inverse_transform(X)
I feel like we are separating the logic while there is no real need for it - in fact they return different shape objects and I would not be expecting that.

Thanks for the thorough review!

Sorry for the lint, I only ran "ruff check" and it passed, so I didn't dig deeper!

Regarding the inverse_transform: I actually meant to remove it from RepeatingBasisFunction before submitting, precisely because it was a big confusing, but it turns out I only removed part of it. There are two reasons why it's more complicated than _RepeatingBasisFunction:

I wanted it to deal with both pandas df and numpy arrays as possible inputs, and if a pandas, I wanted to retrieve the original column name (see this sklearn issue: Add inverse-transform to _set_output scikit-learn/scikit-learn#27891). I then removed the pandas part of the code to avoid adding an extra dependency, but it looks like I left the numpy part behind in the end.
Unlike _RepeatingBasisFunction which only deals with one column input / n_periods column outputs, RepeatingBasisFunction deals with a whole df/array, selects one of its columns as input, and might pass the other columns as well if remainder='passthrough'. Because of the passthrough possibility, the inverse transform should check for the presence of other columns and pass them back. In hindsight, I actually think that this is a bad idea with numpy arrays, because the order is lost (the transform/inverse_transform composition moves the target column first). With pandas it's fine, because the columns are named. EDIT: actually, we could use the self.column to move the reconstructed column back to the right position in a numpy array.

As far as I can see, the best would probably be to remove the inverse_transform from RepeatingBasisFunction entirely for now and wait for scikit-learn/scikit-learn#11463 (inverse_transform for ColumnTransformer) and the other issue mentioned above to be solved, as there would then be a proper way to implement the inverse transform without relying on funny tricks. If a user really needs the inverse transform in the meantime, they can recover the function as I did in the test.

If you agree with this, I'll remove RepeatingBasisFunction.inverse_transform() and address the remaining comments.

Alex-Cremers added 2 commits July 11, 2024 13:42

Implemented get_feature_names_out() for RepeatingBasisFunction and in…

fe2dc31

…verse_transform() in the underlying _RepeatingBasisFunction, which can be accessed through self.pipeline_.named_transformers_['repeatingbasis'].inverse_transform() is needed (rare use case for sure).

Add two tests to test_repeatingbasisfunction.py: one to check that se…

42f8fdf

…tting output format to pandas work, a second to make sure that the inverse_transform() truly returns the original column (as long as original are within the input_range).

FBruzzesi added the enhancement New feature or request label Jul 13, 2024

FBruzzesi changed the title ~~Work on rbf~~ feat: RepeatingBasisFunction.inverse_transform Jul 13, 2024

FBruzzesi reviewed Jul 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: `RepeatingBasisFunction.inverse_transform` #687

feat: `RepeatingBasisFunction.inverse_transform` #687

Alex-Cremers commented Jul 11, 2024 •

edited

Loading

Alex-Cremers commented Jul 11, 2024

koaning commented Jul 11, 2024

Alex-Cremers commented Jul 11, 2024

FBruzzesi commented Jul 11, 2024

FBruzzesi left a comment

FBruzzesi Jul 13, 2024

FBruzzesi Jul 13, 2024

FBruzzesi Jul 13, 2024

FBruzzesi Jul 13, 2024

FBruzzesi Jul 13, 2024

Alex-Cremers commented Jul 13, 2024 •

edited

Loading

		def inverse_transform(self, X):
		"""Transform RBF features back to the input range. Outputs a numpy array.

	tf.pipeline_.named_transformers_['repeatingbasis'].inverse_transform(Z),
	tf.inverse_transform(Z),

feat: RepeatingBasisFunction.inverse_transform #687

Are you sure you want to change the base?

feat: RepeatingBasisFunction.inverse_transform #687

Conversation

Alex-Cremers commented Jul 11, 2024 • edited Loading

Description

Type of change

Checklist:

Alex-Cremers commented Jul 11, 2024

koaning commented Jul 11, 2024

Alex-Cremers commented Jul 11, 2024

FBruzzesi commented Jul 11, 2024

FBruzzesi left a comment

Choose a reason for hiding this comment

FBruzzesi Jul 13, 2024

Choose a reason for hiding this comment

FBruzzesi Jul 13, 2024

Choose a reason for hiding this comment

FBruzzesi Jul 13, 2024

Choose a reason for hiding this comment

FBruzzesi Jul 13, 2024

Choose a reason for hiding this comment

FBruzzesi Jul 13, 2024

Choose a reason for hiding this comment

Alex-Cremers commented Jul 13, 2024 • edited Loading

feat: `RepeatingBasisFunction.inverse_transform` #687

feat: `RepeatingBasisFunction.inverse_transform` #687

Alex-Cremers commented Jul 11, 2024 •

edited

Loading

Alex-Cremers commented Jul 13, 2024 •

edited

Loading