Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

IPC projection that does not required to be ordered #875

Closed
ritchie46 opened this issue Mar 2, 2022 · 3 comments
Closed

IPC projection that does not required to be ordered #875

ritchie46 opened this issue Mar 2, 2022 · 3 comments
Labels
no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog

Comments

@ritchie46
Copy link
Collaborator

Currently we require projection of IPC readers to be ordered. Is this for technical reasons?

Downstream code could be simpler if it does not have to bookkeep the column order.

@jorgecarleitao
Copy link
Owner

It is a technical requirement, but we could move the logic downstream to this crate; this is basically a middleware that receives the unordered projection, orders it and reads, and then re-orders the columns before returning the Chunk.

My feeling is that re-ordering of columns during a projection fits better to a query engine, but I do not have strong feelings here.

Other alternative is to change our indexing to Vec<bool> whose length must be equal to fields, to avoid confusion. Technically this is what happens: we need to know which columns to select in IPC, since an un-selected column must be correctly "skipped" from fields being read.

@ritchie46
Copy link
Collaborator Author

Yeah, I noticed that polars' csv parser also needed sorted projections, because that's the order we encounter the fields. Parquet does not seem to need it.

I made an abstraction downstream. I believe it makes sense to have this requirement based on the data format. I will close this as I don't think it matters much.

@jorgecarleitao jorgecarleitao added the no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog label Mar 6, 2022
@ghuls
Copy link
Contributor

ghuls commented Apr 22, 2022

IPC projection does not require to be ordered anymore: #961

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

No branches or pull requests

3 participants