Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid writing statistics for binary columns to fix JSON error #1498

Merged
merged 7 commits into from
Jul 15, 2023

Conversation

ChewingGlass
Copy link
Contributor

@ChewingGlass ChewingGlass commented Jun 28, 2023

Description

Avoid writing statistics for binary columns to fix JSON error thrown by Arrow

Related Issue(s)

@github-actions
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@ChewingGlass ChewingGlass changed the title fix(#1493): Avoid writing statistics for binary columns to fix JSON error fix: Avoid writing statistics for binary columns to fix JSON error Jun 28, 2023
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a bit messy, but your changes make sense. :/

I think you could accomplish the same thing in a more targeted way by removing and binary columns using collect_stats_conversion and apply_stats_conversion. Use collect_stats_conversion to get which columns are binary, and then apply to remove them from the value (can use serde_json::Value::remove()). LMK if that works instead.

@ChewingGlass
Copy link
Contributor Author

Unfortunately you can't do this with stats conversion, the error occurs when making the ReaderBuilder, not when applying the values.

Comment on lines 243 to 256
let str_val = val.to_string();
let decimal_string = if str_val.len() > *scale as usize {
let (integer_part, fractional_part) =
str_val.split_at(str_val.len() - *scale as usize);
format!("{}.{}", integer_part, fractional_part)
} else {
format!("0.{}", str_val)
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also want to have negative scales. Also, could we unit test this directly in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for negative scales. Not going to be able to add tests for a bit. I added you to my fork so you can commit if you want to take a crack at it.

@wjones127 wjones127 changed the title fix: Avoid writing statistics for binary columns to fix JSON error fix: avoid writing statistics for binary columns to fix JSON error Jun 28, 2023
@wjones127
Copy link
Collaborator

We have a new failure for decimal, but I'm a little stumped on it for now.

@wjones127
Copy link
Collaborator

I think for decimal, we might be able to start using the arbitrary_precision feature of serde_json. Not sure how nicely it plays with decimals yet, but it seems to be the suggested route.

Copy link
Member

@rtyler rtyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising to me, I'm looking forward to this being merged

rust/src/action/checkpoints.rs Outdated Show resolved Hide resolved
Co-authored-by: R. Tyler Croy <rtyler@brokenco.de>
@rtyler rtyler self-assigned this Jul 15, 2023
@rtyler rtyler merged commit 312d1c2 into delta-io:main Jul 15, 2023
@roeap
Copy link
Collaborator

roeap commented Jul 15, 2023

@rtyler - i fear it may have been a bit too early to merge this., since the rust test are failing for this. since 0.13 was just released, we should look into releasing a fix asap. bot sure if @wjones127 already knows whats going on? Otherwise I might look into it once available again.

@rtyler
Copy link
Member

rtyler commented Jul 19, 2023

@roeap Yes, I'm looking at this now. I must have been moving too quickly through some PRs. I now see that the test was failing but I'm not sure why I merged this 🤦 I must have mistaken this pull request for another one which I had open at the time.

I am going to land a revert into main directly, and cut a 0.13.1 with that revert just to avoid causing anybody problems

rtyler added a commit to rtyler/delta-rs that referenced this pull request Jul 19, 2023
…error (delta-io#1498)"

This reverts commit 312d1c2.

I should not have merged this, I must have mistaken the red ❌
pull request delta-io#1498 for something else when I merged. This backs out that commit
rtyler added a commit that referenced this pull request Jul 19, 2023
…error (#1498)"

This reverts commit 312d1c2.

I should not have merged this, I must have mistaken the red ❌
pull request #1498 for something else when I merged. This backs out that commit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate rust
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Json error: Binary is not supported by JSON when writing checkpoint files
4 participants