-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't write data from parquet file to delta table (Rust) #1470
Comments
I think the issue here is we have a 1-1 mapping between Delta lake types and Arrow types, and we don't do any automatic casting. Right now, we may the Delta Lake "string" type to the Arrow "Utf8" type. However, Polars doesn't support "Utf8" type; it always uses the large variant "LargeUtf8". So we'll need to alter the mapping to handle this complexity. |
+1, I also hit this error today experimenting with the https://github.com/danielgafni/dagster-polars from deltalake import DeltaTable
dt = DeltaTable("/path/to/asset.delta")
dt.pyarrow_schema()
# date: string
# counter_name: string
# counter_value: double
dt.optimize.compact() Results in the following error:
The |
…2274) # Description Such a small change, but fixes many issues where parquets were written with arrow where the source data was in large dtype format. By default the parquet::ParquetReader decodes the arrow metadata which in return may give you large dtypes. This would cause issues during DataFusion parquet scan with a filter since the filter wouldn't coerce to the large dtypes. Simply disabling the arrow metadata decoding gives us the parquet schema converted to an arrow schema without large types 👯♂️ # Related issue(s) - closes #1470
Environment
Delta-rs version: 0.12
Binding: Rust
Environment:
Bug
What happened:
Cannot write data from parquet file to delta table
What you expected to happen:
Write data from parquet file to delta table with no issues
How to reproduce it:
More details:
This issue I think is somehow related to data_type: LargeUtf8. If I change data type when creating parquet file with polars, for example, to integer, I'll face no issue (see example below). How I can write data from parquet to delta? Please, provide an example? Maybe it is possible to parse data from parquet to user struct and then create RecordBatch from this user struct for further writing to delta (like here https://github.com/delta-io/delta-rs/blob/main/rust/examples/recordbatch-writer.rs)?
The text was updated successfully, but these errors were encountered: