Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obligation to hardcode full path in local Delta Lake #555

Open
gabmartini opened this issue Nov 20, 2020 · 6 comments
Open

Obligation to hardcode full path in local Delta Lake #555

gabmartini opened this issue Nov 20, 2020 · 6 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@gabmartini
Copy link

gabmartini commented Nov 20, 2020

Hello! I found out that you cannot query a local Delta Lake using SQL directly from the files if you don't use the full path (from root to the Delta Lake directory). I know that Delta Lakes are not supposed to be used in standalone clusters or non-distributed file system but for testing and (perhaps in the future, general public reachment) it's not a bad idea to generate a Delta Lake in a local directory to try it out when you are working in Visual Studio Code or another IDE, that change the working directory to the folder that you have opened and let you use 'relative paths'.

If you try to use data/delta-test (example) and in a PySpark session you try to make a query on that delta-lake, you get:
`pyspark.sql.utils.AnalysisException: Unsupported data source type for direct query on files: delta;;
But if you do /home/gabmartini/data/delta-test it works.

Very frustrating for beginners to be truly honest. Perhaps a very explicit remark in the documentation will help to clear that out.

Thanks and keep up the good work!

@tdas tdas added the bug Something isn't working label Oct 7, 2021
@mgill25
Copy link

mgill25 commented Oct 9, 2021

Can confirm. Encountered this issue. It wasn't fun debugging, I kept thinking the error had something to do with my local spark shell configuration. Quite annoying when trying to tweak various configurations via ALTER TABLE commands.

@felipepessoto
Copy link
Contributor

Do we have any update on this?

@scottsand-db
Copy link
Collaborator

This seems like a good start task if anyone wants to pick it up!

@scottsand-db scottsand-db added the good first issue Good for newcomers label Mar 30, 2023
@felipepessoto
Copy link
Contributor

felipepessoto commented Mar 30, 2023

Do we know if any particular reason for the existing restriction?

Relatives path wouldn't cause issues for queries like: SELECT * FROM delta.myrelativepath?

It could be both, a delta table at myrelativepath, or a table called "myrelativepath" in the delta database

@scottsand-db
Copy link
Collaborator

Linking this to #1572

tdas pushed a commit to tdas/delta that referenced this issue Jun 6, 2023
* [FlinkSQL_PR_1] Flink Delta Sink - Table API UPDATED (delta-io#389)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Co-authored-by: Paweł Kubit <pawel.kubit@getindata.com>
Co-authored-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>

* [FlinkSQL_PR_2] - SQL Support for Delta Source connector. (delta-io#487)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_3] - Delta catalog skeleton (delta-io#503)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_4] - Delta catalog - Interactions with DeltaLog. Create and get table. (delta-io#506)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_5] - Delta catalog - DDL option validation. (delta-io#509)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_6] - Delta catalog - alter table + tests. (delta-io#510)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_7] - Delta catalog - Restrict Delta Table factory to work only with Delta Catalog + tests. (delta-io#514)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_8] - Delta Catalog - DDL/Query hint validation + tests. (delta-io#520)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_9] - Delta Catalog - Adding Flink's Hive catalog as decorated catalog. (delta-io#524)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_10] - Table API support SELECT with filter on partition column. (delta-io#528)

* [FlinkSQL_PR_10] - Table API support SELECT with filter on partition column.

---------

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Co-authored-by: Scott Sandre <scott.sandre@databricks.com>

* [FlinkSQL_PR_11] - Delta Catalog - cache DeltaLog instances in DeltaCatalog. (delta-io#529)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_12] - UML diagrams. (delta-io#530)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_13] - Remove mergeSchema option from SQL API. (delta-io#531)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_14] - SQL examples. (delta-io#535)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* remove duplicate function after rebasing against master

---------

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
Co-authored-by: kristoffSC <krzysiek.chmielewski@gmail.com>
Co-authored-by: Paweł Kubit <pawel.kubit@getindata.com>
Co-authored-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
tdas added a commit to tdas/delta that referenced this issue Jun 8, 2023
tdas pushed a commit to tdas/delta that referenced this issue Jun 8, 2023
* [FlinkSQL_PR_1] Flink Delta Sink - Table API UPDATED (delta-io#389)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Co-authored-by: Paweł Kubit <pawel.kubit@getindata.com>
Co-authored-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>

* [FlinkSQL_PR_2] - SQL Support for Delta Source connector. (delta-io#487)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_3] - Delta catalog skeleton (delta-io#503)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_4] - Delta catalog - Interactions with DeltaLog. Create and get table. (delta-io#506)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_5] - Delta catalog - DDL option validation. (delta-io#509)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_6] - Delta catalog - alter table + tests. (delta-io#510)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_7] - Delta catalog - Restrict Delta Table factory to work only with Delta Catalog + tests. (delta-io#514)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_8] - Delta Catalog - DDL/Query hint validation + tests. (delta-io#520)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_9] - Delta Catalog - Adding Flink's Hive catalog as decorated catalog. (delta-io#524)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_10] - Table API support SELECT with filter on partition column. (delta-io#528)

* [FlinkSQL_PR_10] - Table API support SELECT with filter on partition column.

---------

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Co-authored-by: Scott Sandre <scott.sandre@databricks.com>

* [FlinkSQL_PR_11] - Delta Catalog - cache DeltaLog instances in DeltaCatalog. (delta-io#529)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_12] - UML diagrams. (delta-io#530)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_13] - Remove mergeSchema option from SQL API. (delta-io#531)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_14] - SQL examples. (delta-io#535)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* remove duplicate function after rebasing against master

---------

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
Co-authored-by: kristoffSC <krzysiek.chmielewski@gmail.com>
Co-authored-by: Paweł Kubit <pawel.kubit@getindata.com>
Co-authored-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
@ryan-johnson-databricks
Copy link
Collaborator

From what I undersetand, we use absolute paths because that's the only way to disambiguate the SQL grammar for commands like VACUUM that explicitly recognize path as different from identifier while lexing (one has a leading / and the other doesn't). Otherwise, as @scottsand-db pointed out, delta.foo could be a catalog table or a path and it's sketchy semantics at best to check both (e.g. which one prevails if both are present?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

6 participants