Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Distributed COPY INTO #8594

Closed
yufan022 opened this issue Nov 2, 2022 · 2 comments
Closed

Feature: Distributed COPY INTO #8594

yufan022 opened this issue Nov 2, 2022 · 2 comments
Assignees
Labels
C-feature Category: feature

Comments

@yufan022
Copy link
Contributor

yufan022 commented Nov 2, 2022

Summary

Should we support distributed copy into from s3 distributed=true in the future?

If we use copy into from s3 purge=true to import data simultaneously in many query-node, we must very careful to set s3-prefix or pattern so that different nodes use different directories to avoid repeatedly importing.

like this:

#query1
copy into xx from 's3://query1/' pattern ='.*[.]tsv' purge=true;
#query2
copy into xx from 's3://query2/' pattern ='.*[.]tsv' purge=true;
#query3
copy into xx from 's3://query3/' pattern ='.*[.]tsv' purge=true;

If we support distributed copy into, things will be easier.

we just need to use:

copy into xx from 's3://import/' pattern ='.*[.]tsv' purge=true distributed=true;
@yufan022 yufan022 added the C-feature Category: feature label Nov 2, 2022
@Xuanwo
Copy link
Member

Xuanwo commented Nov 2, 2022

#6395 will make COPY run in a cluster so that copy will be distributed automatically.

Also related to #8128

@PsiACE
Copy link
Member

PsiACE commented Jul 17, 2023

#11943

@PsiACE PsiACE closed this as completed Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category: feature
Projects
None yet
Development

No branches or pull requests

5 participants