[SPARK-51818][CONNECT] Move QueryExecution creation to AnalyzeHandler and don't Execute for AnalyzePlanRequests #50605

peterpashkin · 2025-04-16T10:00:32Z

What changes were proposed in this pull request?

Analyze Plan Requests for Schema should not trigger an Execute on the Logical Plan, currently when sending an AnalyzePlanRequest with a command that gets executed eagerly the Dataset.ofRows(logicalPlan) call executes the underlying command. We do not want this to happen when doing AnalyzePlan. So instead we construct the LogicalPlan with the CommandExecutionMode.SKIP and return the resulting schema that way.
https://issues.apache.org/jira/browse/SPARK-51818

Why are the changes needed?

SQL commands that get sent via an AnalyzePlanRequest get executed eagerly right now, this PR fixes that

Does this PR introduce any user-facing change?

When calling .schema on DataFrame via Spark Connect the plan saved in the DataFrame is not executed anymore, that was the case beforehand. Example: spark.newDataFrame(plan: proto.Plan).schema with plan encoding some SQL command that gets executed eagerly like DROP TABLE the current behavior would execute the SQL command. This will not happen anymore after this change.

How was this patch tested?

Added Test for sending an AnalyzePlanRequest with Drop Table and making sure the table was not dropped

Was this patch authored or co-authored using generative AI tooling?

No

empty formatting test if that is good typo fix only explain change isLocal execution okay only Schema just work please do all correct include small fix squash

HyukjinKwon · 2025-04-16T23:35:16Z

cc @vicennial @hvanhovell FYI

vicennial

Thanks for the changes!
The fact that analysis calls were executing commands looks like an unfortunate bug that slipped through.

Requests:

Please fill in the "User Facing Changes" section. Include a simple example where the behaviour differs (IIUC, spark.sql("<some command>").schema would now have it's behaviour corrected)
Add a server-side test that verifies that the DF was in fact, not executed

peterpashkin · 2025-04-23T13:56:26Z

Thanks Akhil, will add the tests and User Facing Changes. Actually spark.sql(...).schema is still executing the command because commands with .sql() will get executed eagerly. This PR only fixes sending AnalyzePlanRequests not executing the request, like session.analyze(plan).

vicennial

LGTM, thanks for the fix!

.../server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAnalyzeHandler.scala

HyukjinKwon · 2025-05-07T07:12:57Z

Merged to master.

… and don't Execute for AnalyzePlanRequests ### What changes were proposed in this pull request? Analyze Plan Requests for Schema should not trigger an Execute on the Logical Plan, currently when sending an AnalyzePlanRequest with a command that gets executed eagerly the Dataset.ofRows(logicalPlan) call executes the underlying command. We do not want this to happen when doing AnalyzePlan. So instead we construct the LogicalPlan with the CommandExecutionMode.SKIP and return the resulting schema that way. https://issues.apache.org/jira/browse/SPARK-51818 ### Why are the changes needed? SQL commands that get sent via an AnalyzePlanRequest get executed eagerly right now, this PR fixes that ### Does this PR introduce _any_ user-facing change? When calling .schema on DataFrame via Spark Connect the plan saved in the DataFrame is not executed anymore, that was the case beforehand. Example: spark.newDataFrame(plan: proto.Plan).schema with plan encoding some SQL command that gets executed eagerly like DROP TABLE the current behavior would execute the SQL command. This will not happen anymore after this change. ### How was this patch tested? Added Test for sending an AnalyzePlanRequest with Drop Table and making sure the table was not dropped ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50605 from peterpashkin/peter-pashkin/MoveAnalyzeAndSkipExecution. Authored-by: Peter Pashkin <peter.pashkin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

github-actions bot added SQL CONNECT labels Apr 16, 2025

peterpashkin changed the title ~~Move QueryExecution creation to AnalyzeHandler and don't Execute for AnalyzePlanRequests~~ [SPARK-51818] Move QueryExecution creation to AnalyzeHandler and don't Execute for AnalyzePlanRequests Apr 16, 2025

peterpashkin changed the title ~~[SPARK-51818] Move QueryExecution creation to AnalyzeHandler and don't Execute for AnalyzePlanRequests~~ [SPARK-51818][CONNECT] Move QueryExecution creation to AnalyzeHandler and don't Execute for AnalyzePlanRequests Apr 16, 2025

analyze handler

45bf4db

empty formatting test if that is good typo fix only explain change isLocal execution okay only Schema just work please do all correct include small fix squash

peterpashkin force-pushed the peter-pashkin/MoveAnalyzeAndSkipExecution branch from 9a092f3 to 45bf4db Compare April 16, 2025 15:25

Peter Pashkin added 2 commits April 16, 2025 16:06

imports

c26db1e

Row encoder

036d247

vicennial reviewed Apr 23, 2025

View reviewed changes

Add test

71800fb

peterpashkin requested a review from vicennial April 28, 2025 07:52

vicennial approved these changes Apr 29, 2025

View reviewed changes

hvanhovell reviewed Apr 30, 2025

View reviewed changes

.../server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAnalyzeHandler.scala Show resolved Hide resolved

hvanhovell reviewed Apr 30, 2025

View reviewed changes

.../server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAnalyzeHandler.scala Outdated Show resolved Hide resolved

removed one check

ac6d6f5

peterpashkin requested a review from hvanhovell May 2, 2025 15:43

HyukjinKwon approved these changes May 7, 2025

View reviewed changes

HyukjinKwon closed this in d80e857 May 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-51818][CONNECT] Move QueryExecution creation to AnalyzeHandler and don't Execute for AnalyzePlanRequests #50605

[SPARK-51818][CONNECT] Move QueryExecution creation to AnalyzeHandler and don't Execute for AnalyzePlanRequests #50605

peterpashkin commented Apr 16, 2025 •

edited

Loading

Uh oh!

HyukjinKwon commented Apr 16, 2025

Uh oh!

vicennial left a comment

Uh oh!

peterpashkin commented Apr 23, 2025

Uh oh!

vicennial left a comment

Uh oh!

Uh oh!

Uh oh!

HyukjinKwon commented May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-51818][CONNECT] Move QueryExecution creation to AnalyzeHandler and don't Execute for AnalyzePlanRequests #50605

[SPARK-51818][CONNECT] Move QueryExecution creation to AnalyzeHandler and don't Execute for AnalyzePlanRequests #50605

Conversation

peterpashkin commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HyukjinKwon commented Apr 16, 2025

Uh oh!

vicennial left a comment

Choose a reason for hiding this comment

Uh oh!

peterpashkin commented Apr 23, 2025

Uh oh!

vicennial left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

HyukjinKwon commented May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

peterpashkin commented Apr 16, 2025 •

edited

Loading