Make PandasTypeSelector selector dataframe-agnostic #670

MarcoGorelli · 2024-05-14T14:37:52Z

Description

Towards #658

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

My code follows the style guidelines (ruff)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (also to the readme.md)
I have added tests that prove my fix is effective or that my feature works
I have added tests to check whether the new feature adheres to the sklearn convention
New and existing unit tests pass locally with my changes

FBruzzesi

I am very excited about this one! Left a few comments and considerations here and there but I think we are going to merge it soon 😁

FBruzzesi · 2024-05-14T15:45:15Z

pyproject.toml

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "scikit-lego"
-version = "0.8.2"
+version = "0.8.13"


Was line 23 the intended target?

yeah i probably shouldn't make commits in a hurry whilst on a train sorry

sklego/preprocessing/pandastransformers.py

FBruzzesi · 2024-05-14T16:15:45Z

sklego/preprocessing/pandastransformers.py

@@ -173,12 +222,18 @@ def _check_column_names(self, X):


 class PandasTypeSelector(BaseEstimator, TransformerMixin):
-    """The `PandasTypeSelector` transformer allows to select columns in a pandas DataFrame based on their type.
+    """The `PandasTypeSelector` transformer allows to select columns in a DataFrame based on their type.


Considering its name, we could do the following:

class PandasTypeSelector(BaseEstimator, TransformerMixin): def __init__(self, include=None, exclude=None): warn( "Please use `TypeSelector` instead of `PandasTypeSelector`, `PandasTypeSelector` will be deprecated in future versions", DeprecationWarning, ) return TypeSelector(include, exclude)

and then

class TypeSelector(BaseEstimator, TransformerMixin): ... !!! info "New in version 0.9.0"

True, and I think the whole pandastransformers.py module needs renaming

OK to do it all in one go in a separate PR, so that all the ones in pandastransformers.py point to the equivalent one in, say, dataframe_transformers.py?

EDIT: I noticed that this is already exported from sklego.preprocessing, and that that's the path the examples use. I've renamed and deprecated as part of this PR then

The contribution.md page still shows PandasTypeSelector, but that page already looks out-of-date anyway and probably needs a revamp - will address that separately (something about Narwhals probably needs mentioning too, as it's used internally in quite a few places)

Yes we can rename it to have a more intuitive naming path, but as you spotted, it shouldn't matter too much as they are exported into preprocessing.

FBruzzesi · 2024-05-14T16:16:19Z

sklego/preprocessing/pandastransformers.py

+            except ValueError as e:
+                raise ValueError("Columns were not equal during fit and transform") from e


Can this happen?

yup, the last test in tests/test_preprocessing/test_pandastypeselector.py goes there

I've unified the messages and included the error message in the test

Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>

* placeholder to develop narwhals features * feat: make `ColumnDropper` dataframe-agnostic (#655) * feat: make ColumnDropped dataframe-agnostic * use narwhals[polars] in pyproject.toml, link to list of supported libraries * note that narwhals is used for cross-dataframe support * test refactor * docstrings --------- Co-authored-by: FBruzzesi <francesco.bruzzesi.93@gmail.com> * feat: make ColumnSelector dataframe-agnostic (#659) * columnselector with test rufformatted * adding whitespace * fixed the fit and transform * removed intendation in examples * font:false * feat: make `add_lags` dataframe-agnostic (#661) * make add_lags dataframe-agnostic * try getting tests to run? * patch: cvxpy 1.5.0 support (#663) --------- Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * Make `RegressionOutlier` dataframe-agnostic (#665) * make regression outlier df-agnostic * need to use eager-only for this one * pass native to check_array * remove cudf, link to check_X_y * feat: Make InformationFilter dataframe-agnostic * Make Timegapsplit dataframe-agnostic (#668) * make timegapsplit dataframe-agnostic * actually, include cuDF * feat: make FairClassifier data-agnostic (#669) * start all over * fixture working * wip * passing tests - again * pre-commit complaining * changed fixture on test_demographic_parity * feat: Make PandasTypeSelector selector dataframe-agnostic (#670) * make pandas dtype selector df-agnostic * bump version * 3.8 compat * Update sklego/preprocessing/pandastransformers.py Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * fixup pyproject.toml * unify (and test!) error message * deprecate * update readme * undo contribution.md change --------- Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com> * format typeselector and bump version * feat: Make grouped and hierarchical dataframe-agnostic (#667) * feat: make grouped and hierarchical dataframe-agnostic * add pyarrow * narwhals grouped_transformer * grouped transformer eureka * hierarchical narwhalified * so close but so far * return series instead of DataFrame for y * grouped WIP * merge branch and fix grouped * future annotations * format * handling negative indices * solve conflicts * hacking C * fairness: change C values in tests --------- Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com> Co-authored-by: Magdalena Anopsy <74981211+anopsy@users.noreply.github.com> Co-authored-by: Dea María Léon <deamarialeon@gmail.com>

MarcoGorelli added 3 commits May 14, 2024 14:17

make pandas dtype selector df-agnostic

2697b2d

bump version

d2e703c

3.8 compat

d96e427

FBruzzesi reviewed May 14, 2024

View reviewed changes

FBruzzesi mentioned this pull request May 15, 2024

[FEATURE] Narwhals migration for dataframe-agnostic codebase #658

Closed

MarcoGorelli and others added 3 commits May 15, 2024 11:57

Update sklego/preprocessing/pandastransformers.py

4f6b1ea

Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>

fixup pyproject.toml

243f0a5

unify (and test!) error message

a5334cc

MarcoGorelli marked this pull request as ready for review May 15, 2024 11:05

MarcoGorelli added 3 commits May 15, 2024 13:18

deprecate

3ad9a10

update readme

070e2fe

undo contribution.md change

d5f0413

FBruzzesi approved these changes May 16, 2024

View reviewed changes

FBruzzesi merged commit 7adc625 into koaning:narwhals-development May 18, 2024
16 checks passed

FBruzzesi mentioned this pull request May 18, 2024

feat: Narwhals for dataframe-agnostic codebase #671

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make PandasTypeSelector selector dataframe-agnostic #670

Make PandasTypeSelector selector dataframe-agnostic #670

MarcoGorelli commented May 14, 2024

FBruzzesi left a comment •

edited

Loading

FBruzzesi May 14, 2024

MarcoGorelli May 15, 2024

FBruzzesi May 14, 2024 •

edited

Loading

MarcoGorelli May 15, 2024 •

edited

Loading

FBruzzesi May 16, 2024

FBruzzesi May 14, 2024

MarcoGorelli May 15, 2024

		except ValueError as e:
		raise ValueError("Columns were not equal during fit and transform") from e

Make PandasTypeSelector selector dataframe-agnostic #670

Make PandasTypeSelector selector dataframe-agnostic #670

Conversation

MarcoGorelli commented May 14, 2024

Description

Type of change

Checklist:

FBruzzesi left a comment • edited Loading

Choose a reason for hiding this comment

FBruzzesi May 14, 2024

Choose a reason for hiding this comment

MarcoGorelli May 15, 2024

Choose a reason for hiding this comment

FBruzzesi May 14, 2024 • edited Loading

Choose a reason for hiding this comment

MarcoGorelli May 15, 2024 • edited Loading

Choose a reason for hiding this comment

FBruzzesi May 16, 2024

Choose a reason for hiding this comment

FBruzzesi May 14, 2024

Choose a reason for hiding this comment

MarcoGorelli May 15, 2024

Choose a reason for hiding this comment

FBruzzesi left a comment •

edited

Loading

FBruzzesi May 14, 2024 •

edited

Loading

MarcoGorelli May 15, 2024 •

edited

Loading