-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Added struct
namespace with field
method.
#2146
feat: Added struct
namespace with field
method.
#2146
Conversation
|
||
def field(self: Self, name: str) -> PandasLikeSeries: | ||
return self._compliant_series._from_native_series( | ||
self._compliant_series._native_series.apply(lambda x: x[name]).rename(name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using apply can we use https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.struct.field.html and only support this for pyarrow-backed dtypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the majority of Pandas users don't leverage pyarrow-backed dtypes. I added a switch to check and use the struct
namespace if available and fallbacks on the apply
function if it's not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thank you for doing this ππΌ
I've added a couple of comments 'on the go' (I'll check the rest later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @osoucy - it generally looks very promising. Left an additional comment to those of Edo and Marco
narwhals/expr_struct.py
Outdated
>>> df.with_columns(name=nw.col("user").struct.field("name")) | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output is missing and doctest check would fail without it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output has to be manually generated? There is no command that I can use to generate it?
Co-authored-by: Edoardo Abati <29585319+EdAbati@users.noreply.github.com>
β¦/struct_namespace
# Conflicts: # narwhals/_duckdb/expr.py # narwhals/_spark_like/expr.py
# Conflicts: # narwhals/_duckdb/expr.py # narwhals/_spark_like/expr.py
# Conflicts: # narwhals/_duckdb/expr.py # narwhals/_spark_like/expr.py
# Conflicts: # narwhals/_expression_parsing.py
Co-authored-by: Edoardo Abati <29585319+EdAbati@users.noreply.github.com>
Are we good to merge? Anything else to add? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @osoucy !
i'll check the evaluate_output_names
/ alias_output_names
more closely, I'm not 100% sure about those
narwhals/_arrow/expr_struct.py
Outdated
"struct", | ||
"field", | ||
name=name, | ||
evaluate_output_names=lambda _col: [name], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all implementations, instead of overwriting evaluate output names
, can we just use .alias
?
A good test would be nw.col('a').struct.field('b').name.keep()
. I think for polars the resulting column name would be 'a' we should check that we do the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's odd. That's what I tried initially, but it caused some errors when evaluating (mismatching expected and actual something). I must have forgotten something. I removed the changes made to the _from_call
and reuse_series_namespace_implementation
functions.
narwhals/_arrow/series_struct.py
Outdated
@@ -17,5 +17,5 @@ def __init__(self: Self, series: ArrowSeries) -> None: | |||
def field(self: Self, name: str) -> ArrowSeries: | |||
self._compliant_series._name = name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can't mutate self._compliant_series
, you'll need alias
here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Arrow*
stuff I'd suggest using .compliant
and .native
They weren't available when you started the PR @osoucy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more context #2130 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine as a follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least the current implementation
def field(self: Self, name: str) -> ArrowSeries:
return self._compliant_series._from_native_series(
pc.struct_field(self._compliant_series.alias(name)._native_series, name),
)
avoids mutating the compliant series. Maybe I can look into @dangotbanned in a future PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good @osoucy π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are good to merge? :)
This option is not available because the fork is created inside of my organization instead of inside my personnal account. I added you as a contributor to the organization. You should be able to push to the branch now. |
narwhals/_pandas_like/series_list.py
Outdated
@@ -16,6 +16,9 @@ | |||
|
|||
class PandasLikeSeriesListNamespace: | |||
def __init__(self: Self, series: PandasLikeSeries) -> None: | |||
if not hasattr(series._native_series, "list"): | |||
msg = "Series must be of PyArrow List type to support struct namespace." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
narwhals/_expression_parsing.py
Outdated
evaluate_output_names: Output names function. | ||
alias_output_names: Alias output names function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @osoucy ! great feature π
Awesome! Thanks for the guidance and review! |
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
Introduction of the strut namespace for applicable expressions and series with an initial
field
function that returns a field of a struct.