Skip to content

Conversation

@jdesjean
Copy link
Contributor

@jdesjean jdesjean commented Jul 28, 2023

What changes were proposed in this pull request?

Add jobTags to SparkListenerSQLExecutionStart

Why are the changes needed?

As part SPARK-43952, users can define job tags via SparkContext.addJobTag. These tags are then used to trigger cancelation via SparkContext.cancelJobsByTag. Furthermore, these tags can be used to logically group multiple jobs together.

Listener of job events can retrieve job tags via SparkListenerJobStart.props.getProperty(SparkContext.SPARK_JOB_TAGS)

Listener of SQL events can link SparkListenerJobStart & SparkListenerSQLExecutionStart via SparkListenerJobStart.props.getProperty(SQLExecution.EXECUTION_ID_KEY).

However, some SQL executions do not trigger jobs (i.e. commands). As such listeners of SQL executions cannot resolve job tags of all executions.

Does this PR introduce any user-facing change?

No

How was this patch tested?

testOnly org.apache.spark.sql.execution.SQLExecutionSuite

@jdesjean jdesjean changed the title [CONNECT][CORE][SQL][SPARK-44591] Add jobTags to SparkListenerSQLExecutionStart [SPARK-44591][CONNECT][CORE][SQL] Add jobTags to SparkListenerSQLExecutionStart Jul 28, 2023
@jdesjean jdesjean changed the title [SPARK-44591][CONNECT][CORE][SQL] Add jobTags to SparkListenerSQLExecutionStart [SPARK-44591][CONNECT][SQL] Add jobTags to SparkListenerSQLExecutionStart Jul 28, 2023
Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some test nits.
cc @gengliangwang

Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM cc @gengliangwang

time: Long,
modifiedConfigs: Map[String, String] = Map.empty)
modifiedConfigs: Map[String, String] = Map.empty,
jobTags: Set[String] = Set.empty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: where will the jobTags in SparkListenerSQLExecutionStart event be used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a followup to #41964, currently SQL Executions associated with a Spark Connect execution are found by using the EXECUTION_ID_KEY of corresponding Spark Jobs.
There are however various Spark Connect executions that start SQL Execution without starting a Spark Job. For example various commands like SHOW TABLES. Currently with #41964 the SQL Executions of these will not be linked from the Connect tab. With the jobTag, they can be now identified using the Spark Connect tag. Since this PR is almost ready to merge (hope CI will finish soon), I think using it for the Connect tab could be done in a followup PR?

@gengliangwang
Copy link
Member

Thanks, merging to master/3.5

gengliangwang pushed a commit that referenced this pull request Jul 29, 2023
…tart

### What changes were proposed in this pull request?
Add jobTags to SparkListenerSQLExecutionStart

### Why are the changes needed?
As part [SPARK-43952](https://issues.apache.org/jira/browse/SPARK-43952), users can define job tags via SparkContext.addJobTag. These tags are then used to trigger cancelation via SparkContext.cancelJobsByTag. Furthermore, these tags can be used to logically group multiple jobs together.

Listener of job events can retrieve job tags via  SparkListenerJobStart.props.getProperty(SparkContext.SPARK_JOB_TAGS)

Listener of SQL events can link SparkListenerJobStart & SparkListenerSQLExecutionStart via SparkListenerJobStart.props.getProperty(SQLExecution.EXECUTION_ID_KEY).

However, some SQL executions do not trigger jobs (i.e. commands). As such listeners of SQL executions cannot resolve job tags of all executions.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
```
testOnly org.apache.spark.sql.execution.SQLExecutionSuite
```

Closes #42216 from jdesjean/SPARK-44591.

Authored-by: jdesjean <jf.gauthier@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 37b571d)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
@jdesjean jdesjean deleted the SPARK-44591 branch July 31, 2023 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants