-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: casting when data to be written does not match table schema #1427
Conversation
Hi @wjones127,
Let me know what you think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore the integration test failure 😒
final link failed: No space left on device
These GitHub Actions caches are...annoying
let batch = RecordBatch::try_new( | ||
Arc::clone(&schema), | ||
vec![Arc::new(StringArray::from(vec![ | ||
Some("Test123".to_owned()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, where does this value go in the expected table? 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right below, in line 570?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it cannot be parsed as an int it will result in a null value. Which aligns with ansi sql but maybe we should only allow that if the user opts in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nevermind. It seems like this is being inserted as null? That doesn't seem like what a user would want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think we could disable that by passing safe: true
in CastOptions: https://docs.rs/arrow-cast/40.0.0/arrow_cast/cast/struct.CastOptions.html
@@ -704,17 +705,6 @@ mod tests { | |||
table | |||
} | |||
|
|||
async fn get_data(table: DeltaTable) -> Vec<RecordBatch> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good refactor 👍
I can't find an arrow function that does more strict casting, but I kind of would prefer if we were more strict with the casting. For example, these casts aren't lossy:
But I think we should reject these by default
|
I think for now, I'd be fine with casting if we added |
The name is confusing but the default behavior we want is |
I think this is an improvement. If we want to be more strict, we should write a function that checks the schema the user passed and compare it to the table, returning an error if it doesn't logically match. But I think that's fine to do as a follow up. |
Description
Suppose a user has a table with column of type int. A user can create a record batch with type Uft8 and write the value to table. My expectation is that either the writer returns an error or ansi sql behavior is implemented where non-numeric strings are turned into nulls.