-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adopt kernel schemas and improve protocol support #1756
Conversation
rust/src/kernel/actions/arrow.rs
Outdated
} | ||
PrimitiveType::Timestamp => { | ||
// Issue: https://github.com/delta-io/delta/issues/643 | ||
Ok(ArrowDataType::Timestamp(TimeUnit::Microsecond, None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about timestamps that were written with the UTC timezone? Are we doing anything with the IsAdjustedToUtc flag in the parquet timestamp column to return the arrow datatype as Timestamp(tz='UTC')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now this is not handled. Within delta this is also gated behind the timestampNtz
feature. As this PR is quite hard to review due to its size, I focussed on not adjusting any business logic but rather only update the schema / action definitions and getting things to build. So support for that feature should be in a follow-up PR.
@wjones127 @rtyler - There are some failing python tests related to how we handle schema metadata, where we need to make a decision on the kernel side. Other then that, I think it's ready for review. Since this PR is again quite sizable, I focussed on not making any changes to the logic and limit impact on the APIs. As such no new feature support in this PR :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for moving this forward. This will be good for unblocking us on making progress on newer protocol versions :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should fix the bigint
bug.
The other Python failure about metadata
is a result of the metadata keys being strings. That seems like a case where the test might have been wrong, right?
rust/src/kernel/schema.rs
Outdated
#[derive(Debug, Serialize, Deserialize, PartialEq, Clone)] | ||
#[serde(rename_all = "camelCase")] | ||
/// Primitive types supported by Delta | ||
pub enum PrimitiveType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this. ❤️
@wjones127 - made display for primitive types consistent with protocol...
This is an open question to me and mainly motivated by not wanting Raised this in the kernel sync - i.e. does delta metadata support nested structures? - but jury is still out in that :). As for going forward here I would appreciate your opinion. Continuing to use |
Absent any evidence, I don't know either. My slight inclination is to move toward strings. |
I share that inclination, so just went ahead and made the change :). |
Description
this PR depends on #1741.Migrating the implementation of actions and schema over from kernel. The schema is much more complete in terms of the more recent delta features and more rigorously leverages the rust type system.
Related Issue(s)
Documentation