Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet: How to do concurrent decoding over columns? #5120

Open
Liyixin95 opened this issue Nov 23, 2023 · 4 comments
Open

Parquet: How to do concurrent decoding over columns? #5120

Liyixin95 opened this issue Nov 23, 2023 · 4 comments
Labels
question Further information is requested

Comments

@Liyixin95
Copy link
Contributor

Which part is this question about

parquet's api.

Describe your question

How to do concurrent read/decoding over columns? The similar code can be found in polars

I found the build_array_reader function, but I can't construct a ParquetField, which is the input param of this function.

Additional context

@Liyixin95 Liyixin95 added the question Further information is requested label Nov 23, 2023
@tustvold
Copy link
Contributor

tustvold commented Nov 23, 2023

We currently only support parallelizing reads at the row group level. In practice this is usually plenty, although if you wanted to add support for this I'd be happy to provide some pointers

@Liyixin95
Copy link
Contributor Author

although if you wanted to as support for this I'd be happy to provide some pointers

Yes please, I would like to have a try.

@tustvold
Copy link
Contributor

Ok I'll have a think about how to approach this and get back to you in a couple of days

@Liyixin95
Copy link
Contributor Author

Ok I'll have a think about how to approach this and get back to you in a couple of days

hello, have you found any approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants