Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from_json to use null as line delimiter #11499

Draft
wants to merge 1 commit into
base: branch-24.10
Choose a base branch
from

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Sep 25, 2024

I am leaving this as draft for two reasons.

  1. it does not fix having \r in the JSON. See [BUG] \n is not considered whitespace when tokenizing JSON rapidsai/cudf#16915
  2. There is a very large performance hit going to a regexp for stripping the characters from the input. I am a bit conflicted here because we do need a fix for this at some point even without trying to support \r and \n in the data because \t is not being treated as an empty line and will fail.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant