Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External source with new processor #4277

Merged

Conversation

BohuTANG
Copy link
Member

@BohuTANG BohuTANG commented Feb 28, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Summary about this PR:

  1. Bump opendal to v0.1.3.
  2. Make external source(like S3) as table engine:
    • We can load data as COPY INTO t1 FROM 's3://', use read2() api
    • We can unload data from table to s3 as COPY INTO s3:// FROM t1, use append_data() api
    • We can purge the s3 file after load data, use truncate() api
    • We can query from s3 file directly, for some file format we can do the predicates/limit push down.

Changelog

  • Improvement

Related Issues

Part of #3586

Test Plan

Unit Tests

Stateless Tests

@vercel
Copy link

vercel bot commented Feb 28, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/databend/databend/BEq46q8qenDYxCcNczTJxqztVq6G
✅ Preview: https://databend-git-fork-bohutang-dev-external-scan-pr-a90c50-databend.vercel.app

[Deployment for afbe5ac canceled]

@mergify
Copy link
Contributor

mergify bot commented Feb 28, 2022

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@BohuTANG BohuTANG force-pushed the dev-external-scan-processor-3586-patch-3 branch from 7c2289a to 5d9ffee Compare March 1, 2022 06:59
@BohuTANG BohuTANG force-pushed the dev-external-scan-processor-3586-patch-3 branch from 933cc6e to f2ef6ac Compare March 2, 2022 03:56
TableSource(TableInfo),

// S3 external source, 's3://'.
S3ExternalSource(S3ExternalTableInfo),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make s3 as a special table?

Such as a table function that holds the temporary table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make s3 as table engine, we can do the unloading as copy from table into s3://

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems not extensible enough, considering we will support azblob, gcs, and other locations.

Copy link
Member

@sundy-li sundy-li Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But table function can also support insert.

copy from table into s3://

select * from s3://

  1. as_table to generate a temporary table.
  2. Select or insert works with the table.

I do think it's better to introduce external table function, like:

select * from external(stage_name, ...)

This may works for s3, azblob, ... anything stage supports.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have add a desc for why make it as a table engine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, for select * from @stage ..., we will convert the @stage alias to the external table info which get from the metasrv(after we support stage) in the sql parser phase, it will be same as select * from s3:// did.

@Xuanwo
Copy link
Member

Xuanwo commented Mar 2, 2022

To fix #4256, I start a PR #4298 to address it. Let's merge the main branch after the PR gets merged. 😆

@BohuTANG BohuTANG marked this pull request as ready for review March 2, 2022 12:09
@BohuTANG
Copy link
Member Author

BohuTANG commented Mar 2, 2022

To fix #4256, I start a PR #4298 to address it. Let's merge the main branch after the PR gets merged. 😆

I will rebase this PR later :)

@BohuTANG BohuTANG changed the title External source for new processor External source with new processor Mar 2, 2022
@BohuTANG
Copy link
Member Author

BohuTANG commented Mar 3, 2022

@zhang2014 @sundy-li, PTAL
@Xuanwo, PTAL the part of OpenDAL
Thanks

@BohuTANG BohuTANG merged commit 106af03 into databendlabs:main Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants