-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-43879][CONNECT] Decouple handle command and send response on server side #41527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
553abb0 to
85b6c31
Compare
|
ping @hvanhovell @grundprinzip |
a2c55a7 to
defbd2f
Compare
|
@grundprinzip Could you take a look? I have rebased many times. |
grundprinzip
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overall approach is fine. I don't think this is the most burning refactoring that needs to be done, but it's ok.
However, there are a couple of things that we need to make sure. The APIs of the planner change their visibility and we need to make sure that this is really necessary.
In addition there is the question if the planner can really exist without response handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make this private[connect]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a valid use-case where this can be empty? Personally, I don't think that this should ever be none. I understand for testing purposes one might inject a mock, but otherwise the planner cannot exist without the responsehandler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There exists some test case only need transform proto to LogicalPlan and doesn't depend on responsehandler.
Please refer:
Line 181 in 1a5b9f2
| val planner = new SparkConnectPlanner(SessionHolder.forTesting(spark)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove the private? This should at least be package private.
|
@grundprinzip Thank you for review. |
Yes. Because connect exposes the |
Spark Connect exists test case uses the planner without response handler. Could we add response handler for it ? |
3e56f84 to
f66d454
Compare
|
ping @grundprinzip cc @hvanhovell |
|
@beliefer In what I've been working on:
I do plan to have SparkConnectStreamHandler be responsible for sending the RPCs. That's not yet ready in my PR, in the current iteration I had the execution thread send the RPCs, but I need to hand it over back to the SparkConnectStreamHandler thread. I agree with you that it's good division of responsibilities, especially since my followup work is that I want to be able to "reconnect" to a query after the initial ExecutePlan disconnects, so I need to have another RPC call to reconnect to the stream.
I avoid the need for the refactoring by instead passing my own class implementing the |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
SparkConnectStreamHandlertreats the proto requests from connect client and send the responses back to connect client.SparkConnectStreamHandlerholds a componentStreamObserverto send responses. Currently, SparkConnectPlanner also knowsStreamObserverpassed by method parameters. So the behavior introduces some issues.SparkConnectStreamHandlerholds theStreamObserverand shouldn't expose it. Now, we exposeStreamObservertoSparkConnectPlannerand the latter is public, so every developers could use it.Based on the visibility issue, this is usually not a secure strategy.
Because
SparkConnectStreamHandleris fully responsible for responding to RPC, which is a division of responsibilities in good programming. The better code design, the easier to extend function.So I think we should keep the
StreamObservercould be accessed only withSparkConnectStreamHandler.This PR wraps the detail of
StreamObserverintoSparkConnectStreamHandler,SparkConnectPlanneronly need callsendResponseif the response is ready.This PR want decouple the process handle commands and the other process send responses on server side.
Note: As we disscussed on #41379, this PR doesn't delay to send any response "right now" and expects that it will be returned.
Why are the changes needed?
Decouple handle command and send response on server side.
Does this PR introduce any user-facing change?
'No'.
Just update the inner implementation.
How was this patch tested?
Exists test cases.