[REVIEW] Adding support for unsigned int #5431

rgsl888prabhu · 2020-06-09T20:25:01Z

Adds support of unsigned int to cudf, updated validations and test cases to accommodate it.

And instead of -1 as sentinel value, max(int64) is being used in most of the cases.

…nsigned_int_porting

python/cudf/cudf/tests/test_binops.py

python/cudf/cudf/utils/dtypes.py

cpp/include/cudf/utilities/traits.hpp

cwharris

🙌

python/cudf/cudf/tests/test_categorical.py

python/cudf/cudf/tests/test_binops.py

python/cudf/cudf/tests/test_index.py

python/cudf/cudf/tests/test_repr.py

python/cudf/cudf/tests/test_sorting.py

python/cudf/cudf/_lib/reduce.pyx

python/cudf/cudf/core/column/numerical.py

python/cudf/cudf/tests/test_sorting.py

python/cudf/cudf/tests/test_string.py

kkraus14

I'm super uncomfortable with using 0 as a sentinel for nulls throughout the tests. It seems very error prone and fragile. Any ideas on what we can do instead? If we wait for the Pandas 1.0 PR do the new Pandas nullable types help?

rgsl888prabhu · 2020-06-11T14:15:20Z

Will this close #2819?

It will, but there are other issues such as #5352 and #5351 which still requires changes in libcudf side.

rgsl888prabhu · 2020-06-11T14:24:33Z

I'm super uncomfortable with using 0 as a sentinel for nulls throughout the tests. It seems very error prone and fragile. Any ideas on what we can do instead? If we wait for the Pandas 1.0 PR do the new Pandas nullable types help?

It might help, but that would require lot of additional changes to testing to accommodate that change and how close are we with supporting pandas 1.0?

shwina · 2020-06-11T14:39:12Z

I'm super uncomfortable with using 0 as a sentinel for nulls throughout the tests. It seems very error prone and fragile. Any ideas on what we can do instead?

I agree 100% here. Here are some additional considerations:

0 should be considered as good a sentinel value as any other, i.e., just changing our sentinel value to something else doesn't guarantee our tests aren't broken.
Regarding Pandas 1.0, do they support nulls for all the operations that we have? If so, do their null semantics match ours? If not, this will be difficult to incorporate in the short term.

I think that our feature set is at a point where comparing with Pandas isn't sufficient and/or possible in many cases. For these cases, we should determine what we think is the right behaviour and test against that, rather than what Pandas does for some different (but closely related) case.

That being said, I'm not advocating for not testing against Pandas. Where possible, we should definitely still do that.

…hu/cudf into unsigned_int_porting

rgsl888prabhu · 2020-06-11T20:30:08Z

@kkraus14 @shwina As a temporary solution till we get Pandas 1.0 support, shall we use max of the col dtype to signify null. At least this value will not be as common as 0.

kkraus14 · 2020-06-11T20:35:26Z

@kkraus14 @shwina As a temporary solution till we get Pandas 1.0 support, shall we use max of the col dtype to signify null. At least this value will not be as common as 0.

It would be slightly better, but it still feels extremely fragile

shwina · 2020-06-11T21:00:13Z

@kkraus14 @shwina As a temporary solution till we get Pandas 1.0 support, shall we use max of the col dtype to signify null. At least this value will not be as common as 0.

Practically speaking, yes, I think this is better than 0, but theoretically, max(int64) isn't any better than 0. Neither is 17 or anything else for that matter :)

Suppose we decided to re-write the tests to use Pandas nullable types where possible, and our own test data otherwise, what's the level of effort involved? How many tests would this touch? Is this something that a couple of us could pool effort on and pull off?

rgsl888prabhu · 2020-06-11T22:06:19Z

Just to test, I just changed sentinel value to something random, and I can observe around 10 - 15 tests failing, there can be more than that. But it feels like doable. As @brandon-b-miller mentioned in #5388 , we might have to introduce dtypes to handle nulls, which might add additional burden on modifying these test cases.

kkraus14 · 2020-06-11T23:12:03Z

Lets use max(uint64) as the sentinel value for now (where we can) for the tests, and we'll file issues and follow up in using Pandas nullable types / rolling our own null behaviors in tests as needed to move away from the sentinel value based testing.

rgsl888prabhu · 2020-06-12T18:49:35Z

Changed the sentinel value to be max(int64) rather than max(uint64) as the value of an int can't be greater than sys.maxsize, else any conversion will fail with following error.

>>> np.int8(np.iinfo(np.uint64).max)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

I have used this sentinel value in most of the places apart from places were float column is being created and then it is casted to integer.

python/cudf/cudf/core/column/numerical.py

python/cudf/cudf/utils/cudautils.py

kkraus14 · 2020-06-13T00:01:00Z

@shwina can you review again before we merge?

kkraus14 · 2020-06-13T01:46:51Z

@shwina can you review again before we merge?

connected offline and confirmed this is good to go so merging!

rgsl888prabhu added 6 commits June 5, 2020 14:44

Intial set of changes

7bd6e2f

next set

34b01cd

all tests are passing

fda02c0

removing stale code

1e4029e

style and other changes

9c08847

Merge branch 'branch-0.15' of https://github.com/rapidsai/cudf into u…

f51986a

…nsigned_int_porting

rgsl888prabhu added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer labels Jun 9, 2020

rgsl888prabhu requested review from shwina and kkraus14 June 9, 2020 20:25

rgsl888prabhu requested review from a team as code owners June 9, 2020 20:25

rgsl888prabhu self-assigned this Jun 9, 2020

rgsl888prabhu requested review from cwharris and davidwendt June 9, 2020 20:25

shwina reviewed Jun 9, 2020

View reviewed changes

python/cudf/cudf/tests/test_binops.py Show resolved Hide resolved

rgsl888prabhu added 2 commits June 9, 2020 15:39

final set of changes

a1c76d2

CHANGELOG.md

02efd1e

rgsl888prabhu changed the title ~~[WIP] Adding support for unsigned int~~ [REVIEW] Adding support for unsigned int Jun 9, 2020

rgsl888prabhu added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jun 9, 2020

shwina reviewed Jun 9, 2020

View reviewed changes

python/cudf/cudf/utils/dtypes.py Outdated Show resolved Hide resolved

jrhemstad reviewed Jun 9, 2020

View reviewed changes

cpp/include/cudf/utilities/traits.hpp Outdated Show resolved Hide resolved

cwharris approved these changes Jun 10, 2020

View reviewed changes

rgsl888prabhu and others added 2 commits June 10, 2020 13:45

review changes

444e3ee

Merge branch 'branch-0.15' into unsigned_int_porting

94c9eb1

kkraus14 reviewed Jun 11, 2020

View reviewed changes

kkraus14 requested changes Jun 11, 2020

View reviewed changes

rgsl888prabhu added 3 commits June 11, 2020 10:28

changes to make the list of dtypes to set

6311b69

Changes apart from null handling

5146c6e

Merge branch 'unsigned_int_porting' of https://github.com/rgsl888prab…

c0fbca8

…hu/cudf into unsigned_int_porting

modified sentinel value to be max(int64)

c4643c8

kkraus14 reviewed Jun 12, 2020

View reviewed changes

python/cudf/cudf/core/column/numerical.py Outdated Show resolved Hide resolved

kkraus14 reviewed Jun 12, 2020

View reviewed changes

python/cudf/cudf/utils/cudautils.py Outdated Show resolved Hide resolved

rgsl888prabhu and others added 2 commits June 12, 2020 14:58

Merge branch 'branch-0.15' into unsigned_int_porting

efe4c56

review changes

57be7be

kkraus14 approved these changes Jun 13, 2020

View reviewed changes

kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Jun 13, 2020

kkraus14 merged commit bd974e0 into rapidsai:branch-0.15 Jun 13, 2020

kkraus14 mentioned this pull request Jun 15, 2020

[FEA] Support unsigned integer types #2819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Adding support for unsigned int #5431

[REVIEW] Adding support for unsigned int #5431

rgsl888prabhu commented Jun 9, 2020 •

edited

Loading

cwharris left a comment

kkraus14 left a comment

rgsl888prabhu commented Jun 11, 2020

rgsl888prabhu commented Jun 11, 2020

shwina commented Jun 11, 2020 •

edited

Loading

rgsl888prabhu commented Jun 11, 2020

kkraus14 commented Jun 11, 2020

shwina commented Jun 11, 2020 •

edited

Loading

rgsl888prabhu commented Jun 11, 2020

kkraus14 commented Jun 11, 2020

rgsl888prabhu commented Jun 12, 2020

kkraus14 commented Jun 13, 2020

kkraus14 commented Jun 13, 2020

[REVIEW] Adding support for unsigned int #5431

[REVIEW] Adding support for unsigned int #5431

Conversation

rgsl888prabhu commented Jun 9, 2020 • edited Loading

cwharris left a comment

Choose a reason for hiding this comment

kkraus14 left a comment

Choose a reason for hiding this comment

rgsl888prabhu commented Jun 11, 2020

rgsl888prabhu commented Jun 11, 2020

shwina commented Jun 11, 2020 • edited Loading

rgsl888prabhu commented Jun 11, 2020

kkraus14 commented Jun 11, 2020

shwina commented Jun 11, 2020 • edited Loading

rgsl888prabhu commented Jun 11, 2020

kkraus14 commented Jun 11, 2020

rgsl888prabhu commented Jun 12, 2020

kkraus14 commented Jun 13, 2020

kkraus14 commented Jun 13, 2020

rgsl888prabhu commented Jun 9, 2020 •

edited

Loading

shwina commented Jun 11, 2020 •

edited

Loading

shwina commented Jun 11, 2020 •

edited

Loading