-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-43879][CONNECT] Decouple handle command and send response on server side #41464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| import org.apache.spark.sql.execution.streaming.StreamingQueryWrapper | ||
| import org.apache.spark.sql.streaming.{StreamingQuery, StreamingQueryProgress, Trigger} | ||
|
|
||
| class SparkConnectHandler(session: SparkSession, val streamHandler: SparkConnectStreamHandler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest rename this as SparkConnectCommandHandler and move it to org.apache.spark.sql.connect.service
| } | ||
| } | ||
|
|
||
| def process( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can not simply remove this method, since SparkConnectPlanner#process is exposed to the extensions
|
Hey @beliefer Can you please help me understand why you want to do this refactoring? What are you aiming to achieve? I understand the coupling in the planners is not optimal, but I would like to avoid changes for the sake of doing a change. My second worry is the relationship to the above mentioned PR that is not the right approach in general. If you outline more what you're looking to achieve it will become easier for me to review. |
|
@grundprinzip Thank you for reply.
Based on the visibility issue, this is usually not a secure strategy.
Because |
What changes were proposed in this pull request?
SparkConnectStreamHandlertreats the proto requests from connect client and send the responses back to connect client.SparkConnectStreamHandlerholds a componentStreamObserverto send responses. Currently, SparkConnectPlanner also knowsStreamObserverpassed by method parameters. So the behavior introduces some issues.SparkConnectStreamHandlerholds theStreamObserverand shouldn't expose it. Now, we exposeStreamObservertoSparkConnectPlannerand the latter is public, so every developers could use it.Based on the visibility issue, this is usually not a secure strategy.
Because
SparkConnectStreamHandleris fully responsible for responding to RPC, which is a division of responsibilities in good programming. The better code design, the easier to extend function.So I think we should keep the
StreamObservercould be accessed only withSparkConnectStreamHandler.This PR wraps the detail of
StreamObserverintoSparkConnectStreamHandler,SparkConnectPlanneronly need callsendResponseif the response is ready.This PR want decouple the process handle commands and the other process send responses on server side.
Note: As we disscussed on #41379, this PR doesn't delay to send any response "right now" and expects that it will be returned.
Why are the changes needed?
Decouple handle command and send response on server side.
Does this PR introduce any user-facing change?
'No'.
Just update the inner implementation.
How was this patch tested?
Exists test cases.