Skip to content

When datafusion.execution.parquet.coerce_int96 is set, timestamp type is still reported as Timestamp(nanoseconds) #15721

@alamb

Description

@alamb

Describe the bug

datafusion.execution.parquet.coerce_int96 is supposed to

If true, parquet reader will read columns of physical type int96 as originating from a different resolution than nanosecond. This is useful for reading data from systems like Spark which stores microsecond resolution timestamps in an int96 allowing it to write values with a larger date range than 64-bit timestamps with nanosecond resolution.

However, when I set this to ms the type is still reported to be Timestamp(Nanoseconds)

To Reproduce

-- Enable coercion of int96 to microseconds
set datafusion.execution.parquet.coerce_int96 = ms;

-- Create external table
CREATE EXTERNAL TABLE int96_from_spark
STORED AS PARQUET
LOCATION 'parquet-testing/data/int96_from_spark.parquet';

-- Print schema
describe int96_from_spark;

Results in

+-------------+-----------------------------+-------------+
| column_name | data_type                   | is_nullable |
+-------------+-----------------------------+-------------+
| a           | Timestamp(Nanosecond, None) | YES         |
+-------------+-----------------------------+-------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

Expected behavior

I expect the output type to be Timestamp(Microsecond, None)

Additional context

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions