-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SHOW PARTITIONS is not allowed on a table that is not partitioned when it is in fact a partitioned Delta table #681
Comments
I think the https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowPartitionsExec.scala |
@Kimahriman is right - more at CheckAnalysis |
Thank you @Kimahriman and @jaceklaskowski for locating the references to the code that manages this logic. Using the CheckAnalysis reference provided by @jaceklaskowski , I have located the Despite the fact (bullet point 3 in my high level summary write up above) that executing This raises another question for me: Are there other additional SQL functions besides My current naïve understanding is that if all the other SQL functions that currently reads and process queries against DELTA tables already works, what is the complexity that is excluding As a new Delta user, my first guess answer for the complexity question is schema evolution, where adding columns are allowed and supported (Spark 3.0+). However it is unclear to me if the support for adding columns are different between "adding non-partition columns" vs "adding partition columns". I am relative new to the Delta OSS project, is this scope of the suggested implementation to fix this issue suitable for beginner? Thanks again for the constructive feedbacks. |
Oh you're using 3.0.2 so that's using the data source V1 code path I think. The error you're hitting is actually here: https://github.com/apache/spark/blob/v3.0.2/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L1012 Show partitions support was only added to V2 in spark 3.1 I think. I assume support would only be added to the V2 code path for delta going forward? |
@cequencer That should really be discussed on the Delta Lake Users and Developers forum. |
Should this be closed as won't fix? It's easier to get and deal with the partitions using the DeltaLog API directly (and you can hack your way to it through py4j from python as well) |
Closing this since using the DeltaLogAPI seems like an easier route to take. Please reopen if this issue is still relevant |
@Kimahriman is there an easy way to access the DeltaLog from pyspark? |
Depends on your definition of "easy". We just do:
|
I am also facing the same issue as @cequencer. My Spark version is I also tried to get the partitioned columns from the Catalog Similar to |
This is very useful sir but it is giving all the partitions, how can we get the valid latest partitions. |
How are you getting partitions information with this? delta_log.columns just returns all columns as List[str]. |
@Krulvis, You can access column 'path'. Every record contains information about some partition, e.g. |
Using
spark-shell
from precompiled OSS ApacheSpark 3.0.2
without Hadoop +io.delta:delta-core_2.12:0.8.0
.High level summary of my complete test program to describe the issue and the debugging information:
a. This is done explicitly using the logic outlined here
SHOW PARTITIONS
and it failed saying thetable is not partitioned
DESCRIBE TABLE EXTENDED
andSHOW CREATE TABLE
a. The
SHOW CREATE TABLE
content did not match the original DDL statements I provided and it is causing theSHOW PARTITIONS
to fail.My objective is to
SHOW PARTITIONS
of myEXTERNAL
managed Delta table created by another program that is not managed to begin with. Is there a more direct approach or am I running into a bug here?The text was updated successfully, but these errors were encountered: