You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When passing parquet_read_options to to_pyarrow_dataset, it is possible to use dictionary_columns to control which columns should be dictionary encoded as they are read.
What happened:
Such columns are not dictionary encoded when they are read, rather they are of type string, as they would be had dictionary_columns been empty.
What you expected to happen:
I expect the particular columns to be dictionary encoded.
I expect the column "test" to be dictionary encoded, rather than just of type string.
More details:
I believe that the problem is in table.py at line 338, the schema of self.schema.to_pyarrow() has the wrong type for the columns which are to be dictionary encoded.
Maybe it is possible to use the physical_schema property of the fragments defined just above to get the right schema, or otherwise parse the read options to modify the schema?
The text was updated successfully, but these errors were encountered:
Delta Tables have a specific schema, and we enforce that when reading we always use that exact schema. Dictionary types are different types in Arrow, not just a minor detail of the array, so reading as dictionary array would mean a different schema.
But perhaps we could parse the read_dictionary and have special handling for that. It does seem desirable to be able to read columns as dictionaries.
)
# Description
When passing `parquet_read_options` to `to_pyarrow_dataset` it is now
possible to use `dictionary_columns` to control which columns should be
dictionary encoded as they are read.
# Related Issue(s)
- closes#938
<!---
For example:
- closes#106
--->
# Documentation
<!---
Share links to useful documentation
--->
Environment
Delta-rs version: 0.6.3
Binding: python
Environment:
Bug
When passing
parquet_read_options
toto_pyarrow_dataset
, it is possible to usedictionary_columns
to control which columns should be dictionary encoded as they are read.What happened:
Such columns are not dictionary encoded when they are read, rather they are of type string, as they would be had
dictionary_columns
been empty.What you expected to happen:
I expect the particular columns to be dictionary encoded.
How to reproduce it:
I expect the column "test" to be dictionary encoded, rather than just of type string.
More details:
I believe that the problem is in table.py at line 338, the schema of
self.schema.to_pyarrow()
has the wrong type for the columns which are to be dictionary encoded.Maybe it is possible to use the
physical_schema
property of the fragments defined just above to get the right schema, or otherwise parse the read options to modify the schema?The text was updated successfully, but these errors were encountered: