-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-43923][CONNECT] Post listenerBus events during ExecutePlanRequest #41443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ec58f21 to
e1964e1
Compare
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/Events.scala
Outdated
Show resolved
Hide resolved
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/Events.scala
Outdated
Show resolved
Hide resolved
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/Events.scala
Outdated
Show resolved
Hide resolved
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/Events.scala
Outdated
Show resolved
Hide resolved
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/Events.scala
Outdated
Show resolved
Hide resolved
...connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala
Outdated
Show resolved
Hide resolved
...connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala
Outdated
Show resolved
Hide resolved
...r/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutePlanHolder.scala
Outdated
Show resolved
Hide resolved
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/service/EventsSuite.scala
Outdated
Show resolved
Hide resolved
...nect/server/src/test/scala/org/apache/spark/sql/connect/service/ExecutePlanHolderSuite.scala
Outdated
Show resolved
Hide resolved
9c6dcf3 to
996dc3a
Compare
91df042 to
ee8e380
Compare
|
Personally, I think Connect is front of Spark Runtime. |
|
@beliefer I am not sure what you are saying here? Can you elaborate? |
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/Events.scala
Outdated
Show resolved
Hide resolved
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/Events.scala
Outdated
Show resolved
Hide resolved
...r/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutePlanHolder.scala
Outdated
Show resolved
Hide resolved
...t/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala
Outdated
Show resolved
Hide resolved
...t/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala
Outdated
Show resolved
Hide resolved
...t/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala
Outdated
Show resolved
Hide resolved
|
Questions comparing to Thriftserver state transitions:
|
Spark Driver is an independent and stable system. The Spark driver uses On the other hand, |
bfe2b73 to
efa548a
Compare
...ct/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectServiceSuite.scala
Outdated
Show resolved
Hide resolved
juliuszsompolski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Merged to master, and branch-3.5. |
### What changes were proposed in this pull request? Add new SparkListenerEvent during Spark Connect ExecutePlanRequest: SparkListenerConnectOperationStarted SparkListenerConnectOperationParsed SparkListenerConnectOperationCanceled, SparkListenerConnectOperationFailed SparkListenerConnectOperationFinished SparkListenerConnectOperationClosed SparkListenerConnectSessionClosed . ### Why are the changes needed? HiveThriftServer2EventManager currently posts events to the listener bus to allow external listeners to track query execution. Mirror these events in Spark Connect. Created new events instead of reusing the thrift events to allow them to evolve separately. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Manual + Unit + E2E Closes #41443 from jdesjean/SPARK-43923. Authored-by: jdesjean <jf.gauthier@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit b44e605) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
|
Hm, I think this makes this test flaky: https://github.com/apache/spark/actions/runs/5581822850/jobs/10201652403 and seems pretty often. Would you mind taking a look @jdesjean ? |
|
Let me skip the test for now - it blocks other PRs, and I don't want to revert this. I made a JIRA at https://issues.apache.org/jira/browse/SPARK-44474 to reanble the test, and made it as a blocker of Spark 3.5.0. |
|
@HyukjinKwon, I'm reenabling the test with a fix |
…ctServiceSuite ### What changes were proposed in this pull request? Finished is emitted in SparkConnectPlanExecution after the arrow conversion job is completed. However, since we don't await the completion of the job, it's possible for SparkConnectPlanExecution to complete before sending Finished. Closed is emitted in SparkConnectExecutePlanHandler in a separate thread. Add await in order to guarantee the order of events between Finished & Closed. ### Why are the changes needed? `Test observe response` at SparkConnectServiceSuite was disabled as flaky after [introduction of events](#41443). Failure surfaced race condition in emitting the Finished & Closed events for Connect request of type plan. The correct order of events is Finished < Closed. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit Closes #42063 from jdesjean/SPARK-44474. Authored-by: jdesjean <jf.gauthier@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…ctServiceSuite ### What changes were proposed in this pull request? Finished is emitted in SparkConnectPlanExecution after the arrow conversion job is completed. However, since we don't await the completion of the job, it's possible for SparkConnectPlanExecution to complete before sending Finished. Closed is emitted in SparkConnectExecutePlanHandler in a separate thread. Add await in order to guarantee the order of events between Finished & Closed. ### Why are the changes needed? `Test observe response` at SparkConnectServiceSuite was disabled as flaky after [introduction of events](#41443). Failure surfaced race condition in emitting the Finished & Closed events for Connect request of type plan. The correct order of events is Finished < Closed. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit Closes #42063 from jdesjean/SPARK-44474. Authored-by: jdesjean <jf.gauthier@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit cf99e6c) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
## What changes were proposed in this pull request? Add a new Spark UI page to display session and execution information for Spark Connect. This builds of the work in SPARK-43923 (#41443) that adds the relevant SparkListenerEvents and mirrors the ThriftServerPage in the Spark UI for JDBC/ODBC. <img width="1709" alt="Screenshot 2023-07-27 at 11 29 22 PM" src="https://github.com/apache/spark/assets/65624911/934b7c69-3b44-460b-8fbb-36a9eb3f0798"> <img width="1716" alt="Screenshot 2023-07-27 at 11 29 15 PM" src="https://github.com/apache/spark/assets/65624911/33dbe6ab-44bf-49a5-ad4c-5ba4a476a1f0"> ### Why are the changes needed? This gives users a way to access session and execution information for Spark Connect via the UI and provides the frontend interface for the related SparkListenerEvents. ### Does this PR introduce _any_ user-facing change? Yes, it will add a new tab/page in the Spark UI ### How was this patch tested? Unit tests Closes #41964 from jasonli-db/spark-connect-ui. Authored-by: Jason Li <jason.li@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org>
## What changes were proposed in this pull request? Add a new Spark UI page to display session and execution information for Spark Connect. This builds of the work in SPARK-43923 (apache#41443) that adds the relevant SparkListenerEvents and mirrors the ThriftServerPage in the Spark UI for JDBC/ODBC. <img width="1709" alt="Screenshot 2023-07-27 at 11 29 22 PM" src="https://github.com/apache/spark/assets/65624911/934b7c69-3b44-460b-8fbb-36a9eb3f0798"> <img width="1716" alt="Screenshot 2023-07-27 at 11 29 15 PM" src="https://github.com/apache/spark/assets/65624911/33dbe6ab-44bf-49a5-ad4c-5ba4a476a1f0"> ### Why are the changes needed? This gives users a way to access session and execution information for Spark Connect via the UI and provides the frontend interface for the related SparkListenerEvents. ### Does this PR introduce _any_ user-facing change? Yes, it will add a new tab/page in the Spark UI ### How was this patch tested? Unit tests Closes apache#41964 from jasonli-db/spark-connect-ui. Authored-by: Jason Li <jason.li@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit f8786f0)
## What changes were proposed in this pull request? Add a new Spark UI page to display session and execution information for Spark Connect. This builds of the work in SPARK-43923 (#41443) that adds the relevant SparkListenerEvents and mirrors the ThriftServerPage in the Spark UI for JDBC/ODBC. <img width="1709" alt="Screenshot 2023-07-27 at 11 29 22 PM" src="https://github.com/apache/spark/assets/65624911/934b7c69-3b44-460b-8fbb-36a9eb3f0798"> <img width="1716" alt="Screenshot 2023-07-27 at 11 29 15 PM" src="https://github.com/apache/spark/assets/65624911/33dbe6ab-44bf-49a5-ad4c-5ba4a476a1f0"> ### Why are the changes needed? This gives users a way to access session and execution information for Spark Connect via the UI and provides the frontend interface for the related SparkListenerEvents. ### Does this PR introduce _any_ user-facing change? Yes, it will add a new tab/page in the Spark UI ### How was this patch tested? Unit tests Closes #41964 from jasonli-db/spark-connect-ui. Authored-by: Jason Li <jason.lidatabricks.com> Signed-off-by: Gengliang Wang <gengliangapache.org> (cherry picked from commit f8786f0) Closes #42224 from juliuszsompolski/SPARK-44394-3.5. Authored-by: Jason Li <jason.li@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org>
What changes were proposed in this pull request?
Add new SparkListenerEvent during Spark Connect ExecutePlanRequest:
SparkListenerConnectOperationStarted
SparkListenerConnectOperationParsed
SparkListenerConnectOperationCanceled,
SparkListenerConnectOperationFailed
SparkListenerConnectOperationFinished
SparkListenerConnectOperationClosed
SparkListenerConnectSessionClosed .
Why are the changes needed?
HiveThriftServer2EventManager currently posts events to the listener bus to allow external listeners to track query execution. Mirror these events in Spark Connect.
Created new events instead of reusing the thrift events to allow them to evolve separately.
Does this PR introduce any user-facing change?
How was this patch tested?
Manual + Unit + E2E