-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-47728: [Python] Check the source argument in parquet.read_table #48008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test failure test_read_table_without_dataset failing on the CI jobs is related. It seems a test that was using a MockDataset with a non-existing file is failing with this change, we might have to revisit that test.
Yep, you're right, I've updated it with the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for another contribution @rmnskb!
LGTM. I just added minor styling suggestions.
| def test_read_table_raises_value_error_when_ds_is_unavailable( | ||
| monkeypatch, source): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def test_read_table_raises_value_error_when_ds_is_unavailable( | |
| monkeypatch, source): | |
| def test_read_table_raises_value_error_when_ds_is_unavailable(monkeypatch, source): |
Rationale for this change
See #47728. Check
sourceargument inpyarrow.parquet.read_tableifpyarrow.datasetis not available.What changes are included in this PR?
Check the
sourceargument, raiseValueErrorif thesourceargument is either a list of.parquetfiles or a directory.Are these changes tested?
Yes
Are there any user-facing changes?
No
In case if the
sourceargument is a directory, I decided not to check it directly, but to catch the exceptions coming from thefs.open_input_file, since it already checks for it, and add extra exception on top of the stack that explains the actual reason.