Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cudf-polars
string/numeric casting #17076cudf-polars
string/numeric casting #17076Changes from 17 commits
3d0dc8a
59ceb03
be0fae9
209b906
0258478
c9199ec
7feb1f3
6d699ac
f29a918
735c9e3
9f2cc18
9ecac41
740af73
cd80083
f98f635
7f75375
9c9d395
00bd36c
c06f984
a68011a
09d8e48
334eef4
3c45ffb
cf62714
0344f53
9cfc487
fdd5abc
69432a1
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this one, the range of int64 is contained in the range of float32. I believe the mapping preserves the order.
Also, what about float to integral? I suppose it depends on what happens to the out of bounds values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct the float to integral cases all fall into the last
return False
because the out of range values might lose their ordering in the cast.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? If the range is just clamped, then you have no problem, ordering is preserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I expect the equivalent of this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that what cudf does though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With
wrap_numerical=True
in the cast for polars, it clamps, AFAICTThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, libcudf clamps. If we're ok encoding libcudf specific implementation behavior into this function we could pass the float to int cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wence- is this current behavior considered buggy? It's almost like we should be raising unless
wrap_numeric==True
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably yes, we should barf for the cases where we're strictish mode because we haven't implemented those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue I forsee here is that
wrap_numerical=False
,strict=True
is the default. This means that by default the GPU backend will also have to scan during the float-int cast for the presence of these values and throw. This is shaping up to be a pattern that occurs in several places within the codebase, and it's probably not ideal to need to scan before every cast.For now I have passed the float-int conversions through this function which retains the existing behavior, since regardless of if OOB values are nullified or clamped we'll retain order. I will raise a separate issue to discuss the proliferation of scanning as a result of polars defaults.