-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40714][SQL] Remove PartitionAlreadyExistsException
#38161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40714][SQL] Remove PartitionAlreadyExistsException
#38161
Conversation
PartitionAlreadyExistsException
LuciferYang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1,LGTM
…eadyExistsException
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although these interfaces are Experimental, this looks like a big change on SupportsAtomicPartitionManagement and SupportsPartitionManagement.
| InternalRow ident, | ||
| Map<String, String> properties) | ||
| throws PartitionAlreadyExistsException, UnsupportedOperationException { | ||
| throws PartitionsAlreadyExistException, UnsupportedOperationException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Spark try-catch the errors thrown by these 2 APIs? If yes then we may have bugs as the existing data source implementations may still throw PartitionAlreadyExistsException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current implementation, Spark invokes createPartition()/createPartitions() from AddPartitionExec (ALTER TABLE .. ADD PARTITION) where all exceptions from the methods are propagated to users directly. But it doesn't matter in our case because AddPartitionExec.run() checks partitions exist BEFORE the invokes, see:
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AddPartitionExec.scala
Lines 41 to 47 in 5d74ace
| val (existsParts, notExistsParts) = | |
| partSpecs.partition(p => table.partitionExists(p.ident)) | |
| if (existsParts.nonEmpty && !ignoreIfExists) { | |
| throw new PartitionsAlreadyExistException( | |
| table.name(), existsParts.map(_.ident), table.partitionSchema()) | |
| } |
So, this PR doesn't affect ALTER TABLE .. ADD PARTITION but it affects v2 ALTER TABLE .. RENAME PARTITION which propagates the exception directly to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case it's OK to change the type of the exception that is propagated to users.
I think it is right time to change the APIs since they are not broadly used so far. I guess, it will be difficult to predict which of |
|
Merging to master. Thank you, @LuciferYang @cloud-fan @dongjoon-hyun for review. |
What changes were proposed in this pull request?
In the PR, I propose to remove
PartitionAlreadyExistsExceptionand usePartitionsAlreadyExistExceptioninstead of it.Why are the changes needed?
PartitionsAlreadyExistExceptionas well asPartitionAlreadyExistsException.PartitionsAlreadyExistException#38152 fixedPartitionsAlreadyExistExceptionbut notPartitionAlreadyExistsException.Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
By running the affected test suites: