-
Notifications
You must be signed in to change notification settings - Fork 931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Adding support for unsigned int #5431
[REVIEW] Adding support for unsigned int #5431
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm super uncomfortable with using 0 as a sentinel for nulls throughout the tests. It seems very error prone and fragile. Any ideas on what we can do instead? If we wait for the Pandas 1.0 PR do the new Pandas nullable types help?
It might help, but that would require lot of additional changes to testing to accommodate that change and how close are we with supporting pandas 1.0? |
I agree 100% here. Here are some additional considerations:
I think that our feature set is at a point where comparing with Pandas isn't sufficient and/or possible in many cases. For these cases, we should determine what we think is the right behaviour and test against that, rather than what Pandas does for some different (but closely related) case. That being said, I'm not advocating for not testing against Pandas. Where possible, we should definitely still do that. |
Practically speaking, yes, I think this is better than Suppose we decided to re-write the tests to use Pandas nullable types where possible, and our own test data otherwise, what's the level of effort involved? How many tests would this touch? Is this something that a couple of us could pool effort on and pull off? |
Just to test, I just changed sentinel value to something random, and I can observe around 10 - 15 tests failing, there can be more than that. But it feels like doable. As @brandon-b-miller mentioned in #5388 , we might have to introduce dtypes to handle nulls, which might add additional burden on modifying these test cases. |
Lets use |
Changed the sentinel value to be
I have used this sentinel value in most of the places apart from places were float column is being created and then it is casted to integer. |
@shwina can you review again before we merge? |
connected offline and confirmed this is good to go so merging! |
Adds support of
unsigned int
to cudf, updated validations and test cases to accommodate it.And instead of
-1
as sentinel value,max(int64)
is being used in most of the cases.