-
Notifications
You must be signed in to change notification settings - Fork 221
Added interoperability with arrow-schema #1442
Conversation
), | ||
DataType::Decimal(precision, scale) => Self::Decimal128(precision as _, scale as _), | ||
DataType::Decimal256(precision, scale) => Self::Decimal256(precision as _, scale as _), | ||
DataType::Extension(_, d, _) => (*d).into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why arrow2 chose to represent extension types as an explicit data type, as opposed to just field metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is so that the Array
has the logical type on it. It allows the user of a Box<dyn Array>
to use .data_type()
to perform the downcast and have the necessary information to build the extension.
For example, polars uses it to store arbitrary Python objects on the type. In theory this could be kept in the Field's metadata.
Pyarrow does the same: https://arrow.apache.org/docs/python/generated/pyarrow.ExtensionType.html
} | ||
DataType::Decimal128(precision, scale) => Self::Decimal(precision as _, scale as _), | ||
DataType::Decimal256(precision, scale) => Self::Decimal256(precision as _, scale as _), | ||
DataType::RunEndEncoded(_, _) => panic!("Run-end encoding not supported by arrow2"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could instead implement TryFrom, but it seemed a touch excessive for a single error case
@@ -160,6 +160,126 @@ pub enum DataType { | |||
Extension(String, Box<DataType>, Option<String>), | |||
} | |||
|
|||
#[cfg(feature = "arrow")] | |||
impl From<DataType> for arrow_schema::DataType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conversion uses truncating as _
casts, as provided the type is valid these cannot overflow / underflow. I'm not sure having a sensible behaviour for things like negative size FixedSizeList is necessary, garbage in garbage out was my thoughts
Thank you @tustvold |
Part of #1429
Adds conversions to/from
arrow-schema