Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve JSON and CSV parsing of integer values #4790

Merged
merged 11 commits into from
Feb 17, 2022

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Feb 15, 2022

Closes #126 and #1986 and #4762

Changes in this PR:

  • Updates JSON and CSV support for integer values to ask cuDF to read strings and then performs casting to the requested integer type with compatibility with Spark.
  • Remove redundant CSV configs for enabling reading boolean, integer, and floating-point values

I filed a follow-on issue #4793 for handling JSON with strings containing integers.

Status

  • Implementation and updated JSON & CSV tests
  • Remove redundant csv configs
  • Fix regression in Mortgage tests
  • File follow-on issues

Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove changed the title WIP: Improve JSON and CSV parsing of integer values Improve JSON and CSV parsing of integer values Feb 15, 2022
@andygrove andygrove marked this pull request as ready for review February 15, 2022 22:45
revans2
revans2 previously approved these changes Feb 16, 2022
@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

There were test failures in test_json_input_meta. I am investigating.

@andygrove
Copy link
Contributor Author

The changes in this PR exposed a bug where the code assumed that GpuTextBasedPartitionReader#readToTable would return a table with the read schema projection applied, and this was not the case for the JSON implementation. This is now fixed.

@andygrove
Copy link
Contributor Author

build

@andygrove andygrove merged commit 1db5070 into NVIDIA:branch-22.04 Feb 17, 2022
@andygrove andygrove deleted the json-integer branch February 17, 2022 22:24
@sameerz sameerz added the bug Something isn't working label Feb 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants