Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added struct namespace with field method. #2146

Merged
merged 44 commits into from
Mar 9, 2025

Conversation

osoucy
Copy link
Contributor

@osoucy osoucy commented Mar 4, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

  • Related issue #<issue number>
  • Closes #<issue number>

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Introduction of the strut namespace for applicable expressions and series with an initial field function that returns a field of a struct.


def field(self: Self, name: str) -> PandasLikeSeries:
return self._compliant_series._from_native_series(
self._compliant_series._native_series.apply(lambda x: x[name]).rename(name),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using apply can we use https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.struct.field.html and only support this for pyarrow-backed dtypes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the majority of Pandas users don't leverage pyarrow-backed dtypes. I added a switch to check and use the struct namespace if available and fallbacks on the apply function if it's not the case.

Copy link
Collaborator

@EdAbati EdAbati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thank you for doing this πŸ™ŒπŸΌ

I've added a couple of comments 'on the go' (I'll check the rest later)

Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @osoucy - it generally looks very promising. Left an additional comment to those of Edo and Marco

Comment on lines 40 to 41
>>> df.with_columns(name=nw.col("user").struct.field("name"))
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output is missing and doctest check would fail without it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output has to be manually generated? There is no command that I can use to generate it?

@dangotbanned dangotbanned mentioned this pull request Mar 5, 2025
10 tasks
osoucy added 4 commits March 5, 2025 10:25
# Conflicts:
#	narwhals/_duckdb/expr.py
#	narwhals/_spark_like/expr.py
# Conflicts:
#	narwhals/_duckdb/expr.py
#	narwhals/_spark_like/expr.py
# Conflicts:
#	narwhals/_duckdb/expr.py
#	narwhals/_spark_like/expr.py
osoucy and others added 5 commits March 5, 2025 21:28
# Conflicts:
#	narwhals/_expression_parsing.py
Co-authored-by: Edoardo Abati <29585319+EdAbati@users.noreply.github.com>
@osoucy
Copy link
Contributor Author

osoucy commented Mar 7, 2025

Are we good to merge? Anything else to add?

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @osoucy !

i'll check the evaluate_output_names / alias_output_names more closely, I'm not 100% sure about those

"struct",
"field",
name=name,
evaluate_output_names=lambda _col: [name],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all implementations, instead of overwriting evaluate output names, can we just use .alias?

A good test would be nw.col('a').struct.field('b').name.keep(). I think for polars the resulting column name would be 'a' we should check that we do the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's odd. That's what I tried initially, but it caused some errors when evaluating (mismatching expected and actual something). I must have forgotten something. I removed the changes made to the _from_call and reuse_series_namespace_implementation functions.

@@ -17,5 +17,5 @@ def __init__(self: Self, series: ArrowSeries) -> None:
def field(self: Self, name: str) -> ArrowSeries:
self._compliant_series._name = name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can't mutate self._compliant_series, you'll need alias here too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Arrow* stuff I'd suggest using .compliant and .native

They weren't available when you started the PR @osoucy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more context #2130 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine as a follow-up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least the current implementation

    def field(self: Self, name: str) -> ArrowSeries:
        return self._compliant_series._from_native_series(
            pc.struct_field(self._compliant_series.alias(name)._native_series, name),
        )

avoids mutating the compliant series. Maybe I can look into @dangotbanned in a future PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good @osoucy 😊

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are good to merge? :)

@MarcoGorelli
Copy link
Member

there's some unnecessary alias calls - I tried making a commit and pushing to your branch, but was denied access

could you check that the "allow edits by maintainers" checkbox is checked? you should see something like this

image

@osoucy
Copy link
Contributor Author

osoucy commented Mar 9, 2025

there's some unnecessary alias calls - I tried making a commit and pushing to your branch, but was denied access

could you check that the "allow edits by maintainers" checkbox is checked? you should see something like this

image

This option is not available because the fork is created inside of my organization instead of inside my personnal account. I added you as a contributor to the organization. You should be able to push to the branch now.

@dangotbanned dangotbanned added the enhancement New feature or request label Mar 9, 2025
@@ -16,6 +16,9 @@

class PandasLikeSeriesListNamespace:
def __init__(self: Self, series: PandasLikeSeries) -> None:
if not hasattr(series._native_series, "list"):
msg = "Series must be of PyArrow List type to support struct namespace."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Comment on lines 220 to 221
evaluate_output_names: Output names function.
alias_output_names: Alias output names function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdated

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @osoucy ! great feature πŸ™Œ

@MarcoGorelli MarcoGorelli merged commit ea213c4 into narwhals-dev:main Mar 9, 2025
29 checks passed
@osoucy
Copy link
Contributor Author

osoucy commented Mar 9, 2025

Awesome! Thanks for the guidance and review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants