-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support with_projection and with_columns in ipc and parquet reader in eager mode #1751
Support with_projection and with_columns in ipc and parquet reader in eager mode #1751
Conversation
Looks very good @illumination-k 👍 Do you think you can also expose it in the python API? If you want, I can do it as well. |
polars/polars-io/src/ipc.rs
Outdated
@@ -161,6 +164,9 @@ where | |||
let i = schema.index_of(column)?; | |||
prj.push(i) | |||
} | |||
|
|||
// Ipc reader panics if the projection is not in increasing order, so sorting is the safer way. | |||
prj.sort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also remove duplicates after sorting:
https://github.com/jorgecarleitao/arrow2/blob/3caa20421f8a6950dec6e2ad1c54125ef3a896ab/src/io/ipc/read/reader.rs#L236
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can remove duplicates, but It seems that it would be more helpful to return an error if there is a duplicate.
It's not good to get an error in select C, B
, but wouldn't it be better to get an error in select B, B
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer returning an error indeed. The API expects unique columns.
polars/polars-io/src/ipc.rs
Outdated
let schema = metadata.schema(); | ||
|
||
if let Some(cols) = self.columns { | ||
let mut prj = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to use it with projections of 100k columns
let mut prj = Vec::with_capacity(cols.len());
I don't understand much about the python API, but can I just rewrite the following part? polars/py-polars/src/dataframe.rs Lines 175 to 201 in a2bbd4d
polars/py-polars/polars/eager/frame.py Line 487 in a2bbd4d
polars/py-polars/polars/eager/frame.py Line 506 in a2bbd4d
|
@ritchie46 |
Thanks. It looks solid. Great work! Maybe you could add what you missed in the cobtrubution guide? |
Oh I'm really sorry, I missed the last part... It is sufficient for me. |
Very nice work @illumination-k. Thanks a lot! |
Support with_projection and with_columns in IPC and parquet reader to select columns in eager mode.
related to #1569