-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ak.from_parquet
returns empty array when columns are specified
#1606
Comments
NB: the |
ak.from_parquet
returns empty array when row groups are specifiedak.from_parquet
returns empty array when columns are specified
@martindurant / @jpivarski I mistakenly tagged you whilst formulating a question concerning pyarrow details here. Now, however, I'll open a PR and we can discuss things there. |
Does Oh wait: things have been changing and I'm not up to date on the changes. >>> import awkward._v2 as ak
>>> ak.metadata_from_parquet(
... "https://pivarski-princeton.s3.amazonaws.com/chicago-taxi.parquet"
... )["form"].columns()
['trip.sec', 'trip.km', 'trip.begin.lon', 'trip.begin.lat', 'trip.begin.time', 'trip.end.lon',
'trip.end.lat', 'trip.end.time', 'trip.path.londiff', 'trip.path.latdiff', 'payment.fare', 'payment.tips',
'payment.total', 'payment.type', 'company'] Getting any column by name no longer works: >>> ak.from_parquet("https://pivarski-princeton.s3.amazonaws.com/chicago-taxi.parquet", columns=["trip.km"])
<Array [{}, {}, {}, {}, {}, {}, ..., {}, {}, {}, {}, {}, {}] type='7728 * {}'> which is a regression. (These features are going to need unit tests, which is a little complicated because that means making small sample files.) |
Aside from fixed quite how the columns are passed, we should presumably warn or error on an attempt to select columns that don't exist. |
Version of Awkward Array
HEAD
Description and code to reproduce
The text was updated successfully, but these errors were encountered: