-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement COPY ... TO
statement
#5654
Comments
If I'm not mistaken, we could implement it by deleting all the target files and then executing an |
At the moment, we do not support the In addition to this, the |
|
The similarity you mention does not apply to Maybe we can find a better way to reduce duplicates in logic. |
It makes sense to me |
+1 on this feature My use case is to run SQL filters over parquet files, to produce a CSV file of the output. I was looking for a (Edited to add: and thank you to the contributors) |
Thanks @timrobertson100 -- I think we are quite close to having all the pieces -- e.g. #6049 from @metesynnada |
I'm busy recently and not working on it. Anyone interesting feels free to take it. |
The current sqlparser implementation doesn't support arbitrary query as a source in |
Yeah, it would be a great contribution. |
I had some free time this afternoon and so I hacked up an initial implementation here: #6313 I was able to write a custom parser in DataFusion, as well as hook it up via the same mechanism as used by |
The basic feature was completed by @devinjdangelo in #7283. There are still some pieces left but I am closing this ticket as done for now |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I would like to parquet data from one format to another, for example to see the effects of page pruning -- #4085 or different orderings on compression and other properties
arrow-rs and DataFusion have all the parts we need (reading from files, sorting, writing to files) we just now need to put them together
We do have a very specialized version in the tpch benchmark driver
https://github.com/apache/arrow-datafusion/blob/26e1b20ea3362ea62cb713004a0636b8af6a16d7/benchmarks/src/tpch.rs#L332-L400
Describe the solution you'd like
I would like DataFusion to support duckdb style
COPY
sql statementsFor example:
Reference:
Describe alternatives you've considered
@metesynnada is working on
INSERT INTO
style syntax in #5130Bonus points for CSV support (ideally the code structure will allow support in the long term but not as part of the initial PR)
Additional context
#5130 (comment)
The text was updated successfully, but these errors were encountered: