Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storages): read_parquet uses OpenDAL for all IO operations. #9684

Merged
merged 2 commits into from
Jan 28, 2023

Conversation

RinChanNOWWW
Copy link
Contributor

@RinChanNOWWW RinChanNOWWW commented Jan 19, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

  • replace std::File with opendal for all IO operations. (TODO(Another PR): use opendal to do "glob").
  • optimize prewhere logic. (do resort before filter).
  • fix scan progress collection.
  • TODO(Another PR): support remote location and async operations.

Closes #issue

@vercel
Copy link

vercel bot commented Jan 19, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated
databend ⬜️ Ignored (Inspect) Jan 28, 2023 at 5:47AM (UTC)

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Jan 19, 2023
@Xuanwo
Copy link
Member

Xuanwo commented Jan 19, 2023

only fs and hdfs implemented blocking api. Will we need to test s3 via blocking api too?

@RinChanNOWWW
Copy link
Contributor Author

only fs and hdfs implemented blocking api. Will we need to test s3 via blocking api too?

I plan to check if the backend supports blocking api and choose which api to use.

But I don't have figured out how to achieve this yet
(reading file meta data is only in sync functions now, maybe we need to use blocking api for these operations?) .

@RinChanNOWWW RinChanNOWWW requested a review from sundy-li January 21, 2023 01:11
@RinChanNOWWW
Copy link
Contributor Author

Have no complete idea about the remote location. Let's implement it later.

@RinChanNOWWW RinChanNOWWW marked this pull request as ready for review January 21, 2023 01:13
@BohuTANG
Copy link
Member

only fs and hdfs implemented blocking api. Will we need to test s3 via blocking api too?

I plan to check if the backend supports blocking api and choose which api to use.

But I don't have figured out how to achieve this yet (reading file meta data is only in sync functions now, maybe we need to use blocking api for these operations?) .

Like this?
https://github.com/datafuselabs/databend/blob/923f3fab8491411a81222917c17a55675f3aa860/src/query/storages/fuse/src/fuse_table.rs#L180

@RinChanNOWWW
Copy link
Contributor Author

Yes. Or we block the file meta data reading?

@Xuanwo
Copy link
Member

Xuanwo commented Jan 28, 2023

Let's implement it later.

LGTM, we can discussion the remote file support later.

@mergify mergify bot merged commit 1332b8e into databendlabs:main Jan 28, 2023
@RinChanNOWWW RinChanNOWWW deleted the read-parquet-opendal branch January 28, 2023 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants