Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expunge NamedColumn #16962

Merged
merged 10 commits into from
Oct 8, 2024
Merged

Expunge NamedColumn #16962

merged 10 commits into from
Oct 8, 2024

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Oct 1, 2024

Description

Everything in the expression evaluation now operates on columns without names. DataFrame construction takes either a mapping from string-valued names to columns, or a sequence of pairs of names and columns.

This removes some duplicate code in the NamedColumn class (by removing it) where we had to fight the inheritance hierarchy.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@wence- wence- requested a review from a team as a code owner October 1, 2024 15:54
@wence- wence- requested review from isVoid and Matt711 October 1, 2024 15:54
@github-actions github-actions bot added Python Affects Python cuDF API. cudf.polars Issues specific to cudf.polars labels Oct 1, 2024
Everything in the expression evaluation now operates on columns
without names. DataFrame construction takes either a mapping from
string-valued names to columns, or a sequence of pairs of names and
columns.

This removes some duplicate code in the NamedColumn class (by removing
it) where we had to fight the inheritance hierarchy.

- Closes rapidsai#16272
Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny suggestions.

Edit: I'll take a closer look at this PR later today.

Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep NamedColumn, but with a "has-a" relationship with Column as you mentioned in the original issue instead of removing it?

@wence-
Copy link
Contributor Author

wence- commented Oct 2, 2024

Why not keep NamedColumn, but with a "has-a" relationship with Column as you mentioned in the original issue instead of removing it?

Good question. I tried that first, but then throughout the IR evaluation, since everything is a NamedColumn I found myself writing column.column everywhere.

But perhaps it is easier if Column just gets an optional name, which is set by evaluation of NamedExpr. Let me try that.

Rather than separating names from columns with tuple return values in
IR evaluation, allow columns to optionally have names that are copied
around. When we construct a dataframe from columns we now must assert
that the column has a name, but otherwise this is OK.
@wence-
Copy link
Contributor Author

wence- commented Oct 2, 2024

Why not keep NamedColumn, but with a "has-a" relationship with Column as you mentioned in the original issue instead of removing it?

OK, I had a go at just letting Columns have an optional name. It seems OK, take a look.

If we prefer this approach, I'm happy to go with it, otherwise we can go back to having a full name/Column split.

@wence- wence- added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 2, 2024
@Matt711
Copy link
Contributor

Matt711 commented Oct 2, 2024

Why not keep NamedColumn, but with a "has-a" relationship with Column as you mentioned in the original issue instead of removing it?

OK, I had a go at just letting Columns have an optional name. It seems OK, take a look.

If we prefer this approach, I'm happy to go with it, otherwise we can go back to having a full name/Column split.

I'm +1 to this approach @wence-, it feels cleaner to me.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments, but generally LGTM.

Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM, just a docstring fix

@wence-
Copy link
Contributor Author

wence- commented Oct 4, 2024

/merge

@rapids-bot rapids-bot bot merged commit 219ec0e into rapidsai:branch-24.12 Oct 8, 2024
101 checks passed
@wence- wence- deleted the wence/fix/16272 branch October 8, 2024 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf.polars Issues specific to cudf.polars improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[FEA] Refactor Column/NamedColumn split in cudf-polars
4 participants