Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add hint to support parquet with nanosecond timestamps #1782

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions python/src/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,9 @@ fn schema_type_to_python(schema_type: SchemaDataType, py: Python) -> PyResult<Py
let struct_type: StructType = struct_type.into();
Ok(struct_type.into_py(py))
}
SchemaDataType::timestamp(_) => Err(PyErr::new::<PyTypeError, _>(
"Encountered schema field not meant to be written",
)),
}
}

Expand Down
20 changes: 20 additions & 0 deletions rust/src/schema/arrow_convert.rs
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,9 @@ impl TryFrom<&schema::SchemaDataType> for ArrowDataType {
))),
}
}
schema::SchemaDataType::timestamp(_) => {
Ok(ArrowDataType::Timestamp(TimeUnit::Microsecond, None))
}
schema::SchemaDataType::r#struct(s) => Ok(ArrowDataType::Struct(
s.get_fields()
.iter()
Expand Down Expand Up @@ -248,6 +251,14 @@ impl TryFrom<&ArrowDataType> for schema::SchemaDataType {
{
Ok(schema::SchemaDataType::primitive("timestamp".to_string()))
}
ArrowDataType::Timestamp(TimeUnit::Nanosecond, None) => {
Ok(schema::SchemaDataType::timestamp(true))
}
ArrowDataType::Timestamp(TimeUnit::Nanosecond, Some(tz))
if tz.eq_ignore_ascii_case("utc") =>
{
Ok(schema::SchemaDataType::timestamp(true))
}
ArrowDataType::Struct(fields) => {
let converted_fields: Result<Vec<schema::SchemaField>, _> = fields
.iter()
Expand Down Expand Up @@ -830,6 +841,15 @@ mod tests {
);
}

#[test]
fn test_delta_from_arrow_timestamp_nano_type() {
let timestamp_field = ArrowDataType::Timestamp(TimeUnit::Nanosecond, None);
assert_eq!(
<crate::SchemaDataType as TryFrom<&ArrowDataType>>::try_from(&timestamp_field).unwrap(),
crate::SchemaDataType::timestamp(true)
);
}

#[test]
fn test_delta_from_arrow_timestamp_type_with_tz() {
let timestamp_field =
Expand Down
4 changes: 4 additions & 0 deletions rust/src/schema/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,10 @@ pub enum SchemaDataType {
/// * timestamp: Microsecond precision timestamp without a timezone
/// * decimal: Signed decimal number with fixed precision (maximum number of digits) and scale (number of digits on right side of dot), where the precision and scale can be up to 38
primitive(String),
/// Variant for timestamps that specifies whether timestamps must be converted from nanoseconds
/// This is only for an initial convert to delta call, after initial parquet is converted it would
/// go back to primitive usage above, so this type should never be written back out
timestamp(bool),
/// Variant representing a struct.
r#struct(SchemaTypeStruct),
/// Variant representing an array.
Expand Down