Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Use OpenDAL API to avoid extra seek call #12524

Merged
merged 4 commits into from
Aug 22, 2023

Conversation

Xuanwo
Copy link
Member

@Xuanwo Xuanwo commented Aug 21, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

This PR will refactor the read metadata implement by adopting OpenDAL's native API instead of seek on reader.

Benchmark:

Test upon a 100MiB parquet file hosted on minio.

old/old                 time:   [4.5618 ms 4.6360 ms 4.7137 ms]
new/new                 time:   [1.4413 ms 1.5201 ms 1.6057 ms]

This change is Reviewable

Signed-off-by: Xuanwo <github@xuanwo.io>
@vercel
Copy link

vercel bot commented Aug 21, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
databend ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 22, 2023 6:02am

@Xuanwo Xuanwo requested review from dantengsky and zhyass August 21, 2023 08:01
@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Aug 21, 2023
@BohuTANG BohuTANG added the ci-benchmark Benchmark: run all test label Aug 21, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-12524-14e54e0

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Xuanwo Xuanwo added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Aug 21, 2023
@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 21, 2023

image

Tpch looks good, but hits seems OOM, let's try again.

@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-12524-14e54e0

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Xuanwo Xuanwo added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Aug 21, 2023
@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 21, 2023

Q0 is 186x slower, seems related to cache changes, let's try again.

@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-12524-3393b15

note: this image tag is only available for internal use,
please check the internal doc for more details.

@dantengsky
Copy link
Member

Untitled

regression of q19 looks weird, this query should benefit from the improvement of this PR

@zhang2014 zhang2014 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Aug 21, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-12524-b4f898c

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 21, 2023

This PR's performance is not as good as my expected, I will figure out what happened.

@Xuanwo Xuanwo added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Aug 22, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-12524-6ac7b4c

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 22, 2023

I conducted a test in the development environment and found that this PR does not have a significant impact on real use cases. I recommend merging this PR initially to determine if it resolves the issue of unexpected query hang.

cc @BohuTANG @dantengsky

@dantengsky
Copy link
Member

I conducted a test in the development environment and found that this PR does not have a significant impact on real use cases. I recommend merging this PR initially to determine if it resolves the issue of unexpected query hang.

cc @BohuTANG @dantengsky

confirm.

Tested with a table consisting of 19900 blocks, point queries that hit 6 blocks show no regression.

@BohuTANG BohuTANG merged commit df42ace into databendlabs:main Aug 22, 2023
@Xuanwo Xuanwo deleted the avoid-extra-seek branch August 22, 2023 07:24
andylokandy pushed a commit to andylokandy/databend that referenced this pull request Nov 27, 2023
Signed-off-by: Xuanwo <github@xuanwo.io>
Co-authored-by: Winter Zhang <coswde@gmail.com>
Co-authored-by: BohuTANG <overred.shuttler@gmail.com>
Co-authored-by: dantengsky <dantengsky@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-benchmark Benchmark: run all test pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants