-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48782][SQL] Add support for executing procedures in catalogs #47943
Conversation
@@ -92,7 +93,38 @@ class QueryExecution( | |||
sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) | |||
} | |||
tracker.setAnalyzed(plan) | |||
plan | |||
|
|||
mode match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part will have to evolve. There are 3 options for us to consider.
- Add a special type of commands that must be executed during the analysis (i.e. generalize this PR).
- Add a special mix-in interface for procedures that know the type of the last result set before the execution. All other procedures will not output anything if invoked via
spark.sql
. - Migrate to
qe.commandExecuted
instead ofqe.analyzed
everywhere as we will know the output only after executing the procedure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan, any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a special mix-in interface that indicates the command returns multiple result sets. It should have a method to return multiple LogicalPlan/DataFrame. SparkSession#sql
will recognize this special interface and get the last DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan, do you mean a special mix-in connector interface or logical plan interface?
When will the execution happen?
/** | ||
* The logical plan for the CALL command. | ||
*/ | ||
case class Call(procedure: LogicalPlan, args: Seq[Expression]) extends UnaryCommand { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we extend UnaryRunnableCommand
to avoid creating a corresponding physical node?
Gentle ping, @aokolnychyi . Although we still have time for feature freeze, I'm wondering if you want to deliver this via Apache Spark 4.0.0-preview2 RC1 (next Monday). |
Will update tomorrow. Thanks for pinging, @dongjoon-hyun! |
Thank you, @aokolnychyi . |
8905e1e
to
d449057
Compare
@@ -298,6 +298,10 @@ statement | |||
LEFT_PAREN columns=multipartIdentifierPropertyList RIGHT_PAREN | |||
(OPTIONS options=propertyList)? #createIndex | |||
| DROP INDEX (IF EXISTS)? identifier ON TABLE? identifierReference #dropIndex | |||
| CALL identifierReference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can split this into a separate PR, if needed.
|
||
private def argMetadata(byName: Boolean): Metadata = { | ||
new MetadataBuilder() | ||
.putBoolean(ProcedureParameter.BY_NAME_METADATA_KEY, byName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we omit this metadata if the arg is not by name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, skipped.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/InvokeProcedures.scala
Show resolved
Hide resolved
/** | ||
* The physical plan of the CALL statement used in EXPLAIN. | ||
*/ | ||
case class CallExec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this
case class ExplainOnlySparkPlan(toExplain: LogicalPlan)... {
def simpleString = toExplain.simpleString
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that, updated.
thanks, merging to master! |
Thanks, @cloud-fan @dongjoon-hyun! |
Nice. Thank you, @aokolnychyi and @cloud-fan . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! Late LGTM
case class MultiResult(children: Seq[LogicalPlan]) extends LogicalPlan { | ||
|
||
override def output: Seq[Attribute] = children.lastOption.map(_.output).getOrElse(Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some comments for this class and the output here (which uses the last result set's schema)
import org.apache.spark.sql.catalyst.InternalRow | ||
import org.apache.spark.sql.catalyst.expressions.Attribute | ||
|
||
case class MultiResultExec(children: Seq[SparkPlan]) extends SparkPlan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto for docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will follow up with some docs.
} | ||
|
||
private def validateParameterModes(procedure: BoundProcedure): Unit = { | ||
procedure.parameters.find(_.mode != ProcedureParameter.Mode.IN).foreach { param => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning to support more parameter modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, yes. There is no active work at the moment, as far as I know.
…test in ProcedureSuite ### What changes were proposed in this pull request? This PR is a followup of #47943 that enables ANSI for malformed input test in ProcedureSuite. ### Why are the changes needed? The specific test fails with ANSI mode disabled https://github.com/apache/spark/actions/runs/10951615244/job/30408963913 ``` - malformed input to implicit cast *** FAILED *** (4 milliseconds) Expected exception org.apache.spark.SparkNumberFormatException to be thrown, but no exception was thrown (ProcedureSuite.scala:264) org.scalatest.exceptions.TestFailedException: at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564) ... ``` The test depends on `sum`'s failure so this PR simply enables ANSI mode for that specific test. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually ran with ANSI mode off. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48193 from HyukjinKwon/SPARK-48782-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? This PR adds support for executing procedures in catalogs. ### Why are the changes needed? These changes are needed per [discussed and voted](https://lists.apache.org/thread/w586jr53fxwk4pt9m94b413xyjr1v25m) SPIP tracked in [SPARK-44167](https://issues.apache.org/jira/browse/SPARK-44167). ### Does this PR introduce _any_ user-facing change? Yes. This PR adds CALL commands. ### How was this patch tested? This PR comes with tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47943 from aokolnychyi/spark-48782. Authored-by: Anton Okolnychyi <aokolnychyi@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…test in ProcedureSuite ### What changes were proposed in this pull request? This PR is a followup of apache#47943 that enables ANSI for malformed input test in ProcedureSuite. ### Why are the changes needed? The specific test fails with ANSI mode disabled https://github.com/apache/spark/actions/runs/10951615244/job/30408963913 ``` - malformed input to implicit cast *** FAILED *** (4 milliseconds) Expected exception org.apache.spark.SparkNumberFormatException to be thrown, but no exception was thrown (ProcedureSuite.scala:264) org.scalatest.exceptions.TestFailedException: at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564) ... ``` The test depends on `sum`'s failure so this PR simply enables ANSI mode for that specific test. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually ran with ANSI mode off. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48193 from HyukjinKwon/SPARK-48782-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? This PR adds support for executing procedures in catalogs. ### Why are the changes needed? These changes are needed per [discussed and voted](https://lists.apache.org/thread/w586jr53fxwk4pt9m94b413xyjr1v25m) SPIP tracked in [SPARK-44167](https://issues.apache.org/jira/browse/SPARK-44167). ### Does this PR introduce _any_ user-facing change? Yes. This PR adds CALL commands. ### How was this patch tested? This PR comes with tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47943 from aokolnychyi/spark-48782. Authored-by: Anton Okolnychyi <aokolnychyi@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…test in ProcedureSuite ### What changes were proposed in this pull request? This PR is a followup of apache#47943 that enables ANSI for malformed input test in ProcedureSuite. ### Why are the changes needed? The specific test fails with ANSI mode disabled https://github.com/apache/spark/actions/runs/10951615244/job/30408963913 ``` - malformed input to implicit cast *** FAILED *** (4 milliseconds) Expected exception org.apache.spark.SparkNumberFormatException to be thrown, but no exception was thrown (ProcedureSuite.scala:264) org.scalatest.exceptions.TestFailedException: at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564) ... ``` The test depends on `sum`'s failure so this PR simply enables ANSI mode for that specific test. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually ran with ANSI mode off. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48193 from HyukjinKwon/SPARK-48782-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
This PR adds support for executing procedures in catalogs.
Why are the changes needed?
These changes are needed per discussed and voted SPIP tracked in SPARK-44167.
Does this PR introduce any user-facing change?
Yes. This PR adds CALL commands.
How was this patch tested?
This PR comes with tests.
Was this patch authored or co-authored using generative AI tooling?
No.