Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implementation for batch dml in dbapi #1055

Merged
merged 4 commits into from
Dec 14, 2023
Merged

Conversation

ankiaga
Copy link
Contributor

@ankiaga ankiaga commented Dec 12, 2023

No description provided.

@ankiaga ankiaga requested review from a team as code owners December 12, 2023 03:40
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: spanner Issues related to the googleapis/python-spanner API. labels Dec 12, 2023
@ankiaga ankiaga added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 12, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 12, 2023
and parsed_statement.statement_type != StatementType.INSERT
):
raise ProgrammingError(
"Only DML statements are allowed in batch " "DML mode."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can this be just one string instead of two concatenated strings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

connection._transaction = None
raise Aborted(status.message)
elif status.code != OK:
raise OperationalError(status.message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should (could) this also include the status code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take it in a follow up PR

@@ -196,6 +203,24 @@ def read_only(self, value):
)
self._read_only = value

@property
def batch_mode(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, giving these names that do not start with an underscore will make them part of the public API. In that case, we should document them and also add validations to verify that they are only called with valid arguments. (But probably we should make them internal instead)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the property

Args:
value (BatchMode)
"""
self._batch_mode = value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if an external uses calls this function when the connection is already in the middle of a different type of batch (e.g. it is now in a DML batch, and it is called to set it to DDL batch)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this property

self._cursor.execute("start batch dml")
self._insert_row(7)
self._insert_row(8)
self._cursor.execute("run batch")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably difficult, but: Do we have any way that we could verify that the run batch call really sends an ExecuteBatchDml request, and does not execute multiple ExecuteSql requests sequentially?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean if we could assert something in the test for that?

Manually I have verified it by debugging

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant automated. That would then also guard us against any regressions, for example if something changes in the underlying client library, which cause this to use multiple ExecuteSql RPCs instead of one ExecuteBatchDml RPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs some investigation, so will take it in a follow up PR

VALUES ({i}, 'first-name-{i}', 'last-name-{i}', 'test.email@domen.ru')
"""
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also have tests for the unhappy path of using batch DML:

  1. What happens if the batch contains an invalid statement as the first statement?
  2. What happens if the batch contains an invalid statement in the middle?
  3. What happens if the batch contains an invalid statement at the end?
  4. Can we test and verify that the retry logic for Batch DML works as expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test for invalid statements.

There are issues with the retry logic which I am fixing in the next PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@ankiaga ankiaga added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 14, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 14, 2023
@ankiaga ankiaga added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 14, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 14, 2023
Copy link
Contributor

@olavloite olavloite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good to me, but with one remaining question/nit on the exception that is raised when a batch fails. The exception should also contain the update counts of the statements that did succeed. Are we sure that it does?

self._cursor.execute("start batch dml")
self._insert_row(7)
self._insert_row(8)
self._cursor.execute("run batch")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant automated. That would then also guard us against any regressions, for example if something changes in the underlying client library, which cause this to use multiple ExecuteSql RPCs instead of one ExecuteBatchDml RPC.

"""
)
with pytest.raises(OperationalError):
self._cursor.execute("run batch")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected this to return a specific batch error that:

  1. Contains the error code, message etc.
  2. And contains the update counts of the statements that did succeed.

See for example this for the JDBC driver: https://github.com/googleapis/java-spanner-jdbc/blob/1f89f78c37b9e118e2c0cbc7f56d3eb1d5745863/src/test/java/com/google/cloud/spanner/jdbc/it/ITJdbcPreparedStatementTest.java#L797

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed we need to create a custom exception class, so will take it in follow up PR

):
cursor.connection._database = mock_db = mock.MagicMock()
mock_db.run_in_transaction = mock_run_in = mock.MagicMock()
cursor.execute(sql="sql")
mock_run_in.assert_called_once_with(
cursor._do_execute_update_in_autocommit, "sql WHERE 1=1", None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change in this PR?

Copy link
Contributor Author

@ankiaga ankiaga Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic to add WHERE clause has moved to parse_utils.classify_statement now which we are mocking here so its same as what is returned as per line 255

VALUES ({i}, 'first-name-{i}', 'last-name-{i}', 'test.email@domen.ru')
"""
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@ankiaga ankiaga enabled auto-merge (squash) December 14, 2023 17:17
@ankiaga ankiaga merged commit 7a92315 into googleapis:main Dec 14, 2023
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/python-spanner API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants