Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Jun 5, 2023

What changes were proposed in this pull request?

SparkConnectStreamHandler treats the proto requests from connect client and send the responses back to connect client. SparkConnectStreamHandler holds a component StreamObserver to send responses. Currently, SparkConnectPlanner also knows StreamObserver passed by method parameters. So the behavior introduces some issues.

  • Visibility

SparkConnectStreamHandler holds the StreamObserver and shouldn't expose it. Now, we expose StreamObserver to SparkConnectPlanner and the latter is public, so every developers could use it.

  • Safety

Based on the visibility issue, this is usually not a secure strategy.

  • Code Design

Because SparkConnectStreamHandler is fully responsible for responding to RPC, which is a division of responsibilities in good programming. The better code design, the easier to extend function.

So I think we should keep the StreamObserver could be accessed only with SparkConnectStreamHandler.
This PR wraps the detail of StreamObserver into SparkConnectStreamHandler, SparkConnectPlanner only need call sendResponse if the response is ready.

This PR want decouple the process handle commands and the other process send responses on server side.

Note: As we disscussed on #41379, this PR doesn't delay to send any response "right now" and expects that it will be returned.

Why are the changes needed?

Decouple handle command and send response on server side.

Does this PR introduce any user-facing change?

'No'.
Just update the inner implementation.

How was this patch tested?

Exists test cases.

@beliefer
Copy link
Contributor Author

beliefer commented Jun 6, 2023

import org.apache.spark.sql.execution.streaming.StreamingQueryWrapper
import org.apache.spark.sql.streaming.{StreamingQuery, StreamingQueryProgress, Trigger}

class SparkConnectHandler(session: SparkSession, val streamHandler: SparkConnectStreamHandler)
Copy link
Contributor

@zhengruifeng zhengruifeng Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest rename this as SparkConnectCommandHandler and move it to org.apache.spark.sql.connect.service

}
}

def process(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can not simply remove this method, since SparkConnectPlanner#process is exposed to the extensions

@grundprinzip
Copy link
Contributor

Hey @beliefer Can you please help me understand why you want to do this refactoring? What are you aiming to achieve? I understand the coupling in the planners is not optimal, but I would like to avoid changes for the sake of doing a change.

My second worry is the relationship to the above mentioned PR that is not the right approach in general. If you outline more what you're looking to achieve it will become easier for me to review.

@beliefer
Copy link
Contributor Author

beliefer commented Jun 6, 2023

@grundprinzip Thank you for reply.
The answer for first question.

  • Visibility

SparkConnectStreamHandler holds the StreamObserver and shouldn't expose it. Now, we expose StreamObserver to SparkConnectPlanner and the latter is public, so every developers could use it.

  • Safety

Based on the visibility issue, this is usually not a secure strategy.

  • Code Design

Because SparkConnectStreamHandler is fully responsible for responding to RPC, which is a division of responsibilities in good programming. The better code design, the easier to extend function.

@beliefer beliefer closed this Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants