-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for loading data from stages #35
Comments
Hi @tharwan this might be possible ... could you share example SQL statements you'd like support for, both creating the stage and loading from it? |
Loading from a stage looks something like this: COPY INTO my_table
FROM @my_stage/my_file
FILE_FORMAT = (FIELD_DELIMITER = ';') Creating a stage is not so relevant in our case but looks like this: CREATE STAGE my_ext_stage
URL='azure://myaccount.blob.core.windows.net/load/files/'
STORAGE_INTEGRATION = myint; maybe related: tobymao/sqlglot#2463 |
Thanks! From the above I gather you want support for CSV files. I think creating the stage would be needed for fakesnow to know where to find the files to load. |
another option could be to just look locally for a file and ignore the @my_stage part. e.g. COPY INTO my_table
FROM @my_stage/my_file
FILE_FORMAT = (FIELD_DELIMITER = ';') would translate to COPY my_table
FROM my_file
(FORMAT CSV, DELIMITER ';'); |
we recently found ourselves wanting this at $job. i may look into a relatively simple initial implementation somewhat soon. My personal plan/preference would be for fakesnow to simply internally track creations of stages inside And then translate COPY statements against the service in question into boto3 calls + insert statements (which ought to work for all the supported cloud providers iirc) It'd then be the job of the user to set up like i said, i'll probably be looking into the feasibility of this strat soon, but i expect sqlglot to be the short term limiting factor in terms of minimally parsing the integration/stage/copy statement syntaxes respectively. |
If the use-case were just testing then I thought for the first iteration of this we could just support a local file via a file:// url. It would avoid the need to set up actual s3/azure storage or moto to mock s3. Would that work? |
my personal usecase requires a storage integration/s3, but presumably my implementing that will be easier if there was already general support against a staged file:// |
Yes good point - if we transform a COPY INTO into a INSERT INTO .. SELECT in duckdb, then we could support both a file:// and s3:// url via the same mechanism relatively easily, using duckdb's support for s3 via the https extension. If you're keen to tackle this a PR would be welcome! |
Hi,
Would it be possible to support loading of data from a stage? (https://docs.snowflake.com/en/user-guide/data-load-considerations-load)
I am not sure from the docs if this would be even inside the scope of this project.
However it would help us a lot to test our complete ETL workflow.
The text was updated successfully, but these errors were encountered: