-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot read partitions with special characters (including space) with pyarrow >= 11 #1393
Labels
bug
Something isn't working
Comments
emanueledomingo
changed the title
Cannot read partitions with special characters (including space)
Cannot read partitions with special characters (including space) with pyarrow >= 11
May 26, 2023
Tested with:
And still persist. |
@wjones127 thanks for working on this issue and releasing a patch of the python package! However, I fear that the issue should be reopened. In particular, I just tested it with:
$ ls -1 tmp/
'animals=Brittle%20Stars'
'animals=Centipede'
'animals=Flamingo'
'animals=Horse'
_delta_log In particular, the following line of code still outputs 0 instead of 1: dt_table.to_pyarrow_table(partitions=[("animals", "=", "Brittle Stars")]).num_rows while, if we remove the partitions kwarg, we get: dt_table.to_pyarrow_table()
I hope this is of any help, many thanks! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Environment
Delta-rs version:
Binding: Python
Environment: Python==3.10, deltalake==0.9.0, pyarrow==11.0.0
Bug
What happened:
If the values of a column contain special characters (including space) the writer encodes them when using the column as a partition. If you then try to read the table with the same column as a partition, it finds nothing.
This bug happens if the pyarrow version is >= 11. It works with pyarrow 10.0.1 (special characters not encoded).
What you expected to happen: Partition of a column with special characters correctly read even if they are encoded.
How to reproduce it:
It
num_rows
returns 0, 1 is expected.More details:
The content of the
/tmp
folder isNote: even if i try with
Brittle%2520Stars
as partition value thenum_rows
returns 0.With pyarrow 10.0.1 the same script gives
num_rows
equal to 1 and the folder isas expected.
The text was updated successfully, but these errors were encountered: