-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hive partition columns with leading underscore: No match for FieldRef.Name(_file) #44352
Labels
Comments
Variation of #42160? |
UPDATE:
FIX:
|
WORKING VARIATION:
CLOSING PS: Thanks and sorry for the noise! :-) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug, including details regarding any error messages, version, and platform.
Hi Arrow team, thanks for sharing such a powerful and fundamental data handling lib! :)
I'm failing to read a hive partitioned parquet dataset when the partition columns have a leading underscore in their names, using the latest Pandas 2.2.3 + PyArrow 17.0.0 combination.
I admit I might be doing something wrong, but found nothing to guide me after browsing the docs, searching the web, and even asking a few LLMs around (!!!)... The fact is that other tools, like duckdb which I also use often, have no issue reading the same dataset.
REPRODUCTION:
FAILURE:
MORE:
YEAR_COLUMN='_year'
YEAR_COLUMN='year'
FILE_COLUMN='_file'
FILE_COLUMN='file'
LASTLY:
pd.read_parquet
of said dataset returns an empty dataframe, I suspect precisely due to the same underlying motives.QUESTION:
Thanks for the Arrow project and any insight/assistance on this.
Component(s)
Python
The text was updated successfully, but these errors were encountered: