Metadata missing timezone breaks s3.read_parquet
in awswrangler 3
#2667
Labels
bug
Something isn't working
s3.read_parquet
in awswrangler 3
#2667
Describe the bug
We recently bumped our major version of awswrangler to major version 3.5.2 from 2.20.1 and pyarrow 15.0.0 we're seeing an issue when calling
s3.read_parquet
on parquet files created with pyarrow 8.0.0.We have files that when read have metadata like below:
You can see
metadata" null
in the data above.This used to be okay in awswrangler 2:
aws-sdk-pandas/awswrangler/s3/_read_parquet.py
Lines 271 to 288 in a81ae5e
But doesn't work in awswrangler 3:
aws-sdk-pandas/awswrangler/_arrow.py
Lines 64 to 79 in de5d161
The issue is this line: @
aws-sdk-pandas/awswrangler/_arrow.py
Line 73 in de5d161
It throws an error because
c["metadata"]
is null and doesn't have a key["timezone"]
.Why did this functionality change and can we change it back?
How to Reproduce
Can't reproduce without attaching file
Expected behavior
No response
Your project
No response
Screenshots
No response
OS
Mac
Python version
3.10.12
AWS SDK for pandas version
3.5.1
Additional context
No response
The text was updated successfully, but these errors were encountered: