Skip to content

SEA: Reduce network calls for synchronous commands #633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: sea-migration
Choose a base branch
from

Conversation

varun-edachali-dbx
Copy link
Collaborator

What type of PR is this?

  • Refactor

Description

In execute_command we first send the execute request to the server, following which, if the request is synchronous, we poll for the request state until it is no longer in the pending state. Following this, we make an additional GET request to the server to get the final request information. There are two areas of improvement:

  • if the request is small and completes execution (i.e., reaches the SUCCEEDED state) within the wait_timeout, then we need not poll for completion or make another GET request to the server. We can immediately construct our ResultSet with the provided response.
  • While we poll for the request state, if the state reaches SUCCEEDED then the response is accompanied by the response data that we need to construct the ResultSet. We need not make another GET request to the server after we are done polling and can instead utilise the last response provided.

How is this tested?

  • Unit tests
  • E2E Tests
  • Manually
  • N/A

Related Tickets & Documents

N/A

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
@varun-edachali-dbx varun-edachali-dbx marked this pull request as ready for review July 10, 2025 02:12
Copy link
Contributor

@jayantsing-db jayantsing-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions inline.

Comment on lines -127 to -146
@dataclass
class GetStatementResponse:
"""Representation of the response from getting information about a statement."""

statement_id: str
status: StatementStatus
manifest: ResultManifest
result: ResultData

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "GetStatementResponse":
"""Create a GetStatementResponse from a dictionary."""
return cls(
statement_id=data.get("statement_id", ""),
status=_parse_status(data),
manifest=_parse_manifest(data),
result=_parse_result(data),
)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you would still need this? Probably part of different PRs

"""

# Create and return a SeaResultSet
from databricks.sql.backend.sea.result_set import SeaResultSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this lazy import has perf gain (lazily module loading when needed)?

@@ -324,7 +323,7 @@ def _extract_description_from_manifest(
return columns

def _results_message_to_execute_response(
self, response: GetStatementResponse
self, response: ExecuteStatementResponse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know that GetStatementResponse and ExecuteStatementResponse have the same fields wrt results (interchangeable).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did not realise this at first either, but this can be confirmed by comparing the response in the REST reference as well:

This does make sense logically to me as well, the purpose of the GET is to get the info related to an execution statement.

@@ -378,7 +399,7 @@ def _check_command_not_in_failed_or_closed_state(

def _wait_until_command_done(
self, response: ExecuteStatementResponse
) -> CommandState:
) -> ExecuteStatementResponse:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems odd to me that this method does polling and still ends up return ExecuteStatementResponse. Semantically, this is a response to ExecuteRequest

@@ -574,9 +591,25 @@ def get_query_state(self, command_id: CommandId) -> CommandState:
path=self.STATEMENT_PATH_WITH_ID.format(sea_statement_id),
data=request.to_dict(),
)
response = ExecuteStatementResponse.from_dict(response_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to return ExecuteResponse as a result of Polling?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants