Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure comparisons with pyints and integer series always succeed #16532

Merged
merged 2 commits into from
Aug 13, 2024

Conversation

seberg
Copy link
Contributor

@seberg seberg commented Aug 12, 2024

When Python integers are compared to a series of integers, the result can always be correctly defined no matter the values of the Python integer.

This was always a very mild issue. But with NumPy 2 behavior not upcasting the computation result type based on the value anymore, even things like:

cudf.Series([1, 2, 3], dtype="int8") < 1000

would fail.
(Similar paths could be taken for other integer scalars, but there would be mostly nice for performance.)

N.B. NumPy/pandas also support exact comparisons when mixing e.g. uint64 and int64. This is another rare exception that cudf currently does not support.

Closes gh-16282

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Just as a note for reviewers. The later operator checks, need other to be fully coerced, so unfortunately can't deal with all operator special paths at the start.

@seberg seberg requested a review from a team as a code owner August 12, 2024 18:27
@github-actions github-actions bot added the Python Affects Python cuDF API. label Aug 12, 2024
When Python integers are compared to a series of integers, the result
can always be correctly defined no matter the values of the Python
integer.

This was always a very mild issue.  But with NumPy 2 behavior not
upcasting the computation result type based on the value anymore,
even things like:
```
cudf.Series([1, 2, 3], dtype="int8") < 1000
```
would fail.

N.B. NumPy/pandas also support exact comparisons when mixing e.g.
uint64 and int64.  This is another rare exception that cudf currently
does not support.
@mroeschke
Copy link
Contributor

/ok to test

@mroeschke mroeschke added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 12, 2024
Copy link
Contributor

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a similar issue with comparing pyfloats and integer or float series?

@seberg
Copy link
Contributor Author

seberg commented Aug 13, 2024

Would there be a similar issue with comparing pyfloats and integer or float series?

Yes and no: we (as in numpy/pandas) are usually content with normal promotion logic there (unlike Python, which corrects for overflows and rounding when comparing int and float scalars).

@mroeschke
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit cf3fabf into rapidsai:branch-24.10 Aug 13, 2024
80 checks passed
@jakirkham
Copy link
Member

Thanks Sebastian and Matt! 🙏

@seberg seberg deleted the comp-int branch August 14, 2024 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[BUG] Integer promotion fixes needed for NumPy 2 for comparison operators
3 participants