-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-44591][CONNECT][SQL] Add jobTags to SparkListenerSQLExecutionStart #42216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
juliuszsompolski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some test nits.
cc @gengliangwang
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
Outdated
Show resolved
Hide resolved
juliuszsompolski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM cc @gengliangwang
| time: Long, | ||
| modifiedConfigs: Map[String, String] = Map.empty) | ||
| modifiedConfigs: Map[String, String] = Map.empty, | ||
| jobTags: Set[String] = Set.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: where will the jobTags in SparkListenerSQLExecutionStart event be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a followup to #41964, currently SQL Executions associated with a Spark Connect execution are found by using the EXECUTION_ID_KEY of corresponding Spark Jobs.
There are however various Spark Connect executions that start SQL Execution without starting a Spark Job. For example various commands like SHOW TABLES. Currently with #41964 the SQL Executions of these will not be linked from the Connect tab. With the jobTag, they can be now identified using the Spark Connect tag. Since this PR is almost ready to merge (hope CI will finish soon), I think using it for the Connect tab could be done in a followup PR?
|
Thanks, merging to master/3.5 |
…tart ### What changes were proposed in this pull request? Add jobTags to SparkListenerSQLExecutionStart ### Why are the changes needed? As part [SPARK-43952](https://issues.apache.org/jira/browse/SPARK-43952), users can define job tags via SparkContext.addJobTag. These tags are then used to trigger cancelation via SparkContext.cancelJobsByTag. Furthermore, these tags can be used to logically group multiple jobs together. Listener of job events can retrieve job tags via SparkListenerJobStart.props.getProperty(SparkContext.SPARK_JOB_TAGS) Listener of SQL events can link SparkListenerJobStart & SparkListenerSQLExecutionStart via SparkListenerJobStart.props.getProperty(SQLExecution.EXECUTION_ID_KEY). However, some SQL executions do not trigger jobs (i.e. commands). As such listeners of SQL executions cannot resolve job tags of all executions. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ``` testOnly org.apache.spark.sql.execution.SQLExecutionSuite ``` Closes #42216 from jdesjean/SPARK-44591. Authored-by: jdesjean <jf.gauthier@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit 37b571d) Signed-off-by: Gengliang Wang <gengliang@apache.org>
What changes were proposed in this pull request?
Add jobTags to SparkListenerSQLExecutionStart
Why are the changes needed?
As part SPARK-43952, users can define job tags via SparkContext.addJobTag. These tags are then used to trigger cancelation via SparkContext.cancelJobsByTag. Furthermore, these tags can be used to logically group multiple jobs together.
Listener of job events can retrieve job tags via SparkListenerJobStart.props.getProperty(SparkContext.SPARK_JOB_TAGS)
Listener of SQL events can link SparkListenerJobStart & SparkListenerSQLExecutionStart via SparkListenerJobStart.props.getProperty(SQLExecution.EXECUTION_ID_KEY).
However, some SQL executions do not trigger jobs (i.e. commands). As such listeners of SQL executions cannot resolve job tags of all executions.
Does this PR introduce any user-facing change?
No
How was this patch tested?