test: fix flaky test for blocking pull shutdown #378

plamut · 2021-04-14T13:12:48Z

Fixes #372.

If a test is run in a suite with other system tests, the messages are not always published in batch sizes as desired, which can affect how ACKs are handled on the backend (the server requires all messages published in a single batch to be ACK-ed in order
to accept the ACKs).

If a publisher client instance is shared between the tests, the batching can apparently be affected, thus we create a new client
instance before each test. Since these tests are slow system tests, the overhead should not be significant.

Steps to verify

Apply the following patch to the code, disable capturing test output (-s option to the py.test command).

diff --git google/cloud/pubsub_v1/publisher/_batch/thread.py google/cloud/pubsub_v1/publisher/_batch/thread.py
index 36dd3b9..c3aacdf 100644
--- google/cloud/pubsub_v1/publisher/_batch/thread.py
+++ google/cloud/pubsub_v1/publisher/_batch/thread.py
@@ -211,6 +211,7 @@ class Batch(base.Batch):
         commit_thread = threading.Thread(
             name="Thread-CommitBatchPublisher", target=self._commit
         )
+        print(f"batch starting commit for {len(self._messages)} message(s)")
         commit_thread.start()

     def _commit(self):

Run system tests in --verbose mode and check the output.

Actual result (before the PR fix)
The system test test_streaming_pull_blocking_shutdown fails with a high probability.

Expected result

The system test test_streaming_pull_blocking_shutdown consistently passes.

In addition, wherever the _publish_messages(..., batch_sizes=[...]) helper is used in the tests, the output size output should match the values passed through batch_sizes. For example, in test_streaming_pull_max_messages(), batch_sizes is set to (7, 4, 8, 2, 10, 1, 3, 8, 6, 1), and the test output matches that:

...
tests/system.py::TestStreamingPull::test_streaming_pull_max_messages batch starting commit for 7 message(s)
batch starting commit for 4 message(s)
batch starting commit for 8 message(s)
batch starting commit for 2 message(s)
batch starting commit for 10 message(s)
batch starting commit for 1 message(s)
batch starting commit for 3 message(s)
batch starting commit for 8 message(s)
batch starting commit for 6 message(s)
batch starting commit for 1 message(s)
...

PR checklist:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

If a test is run in a suite with other system tests, the messages are not always published in batch sizes as desired, which can affect how ACKs are handled on the backend (the server requires all messages published in a single batch to be ACK-ed in order to accept the ACKs). If a publisher client instance is shared between the tests, the batching can apparently be affected, thus we create a new client instance before each test. Since these tests are slow system tests, the overhead should not be significant.

plamut · 2021-04-14T13:32:51Z

@jimfulton If you have time and because you are now somewhat familiar with these tests, you too can check this.

Although you said that this particular test was not failing for you locally?

jimfulton · 2021-04-15T14:13:33Z

Right, I haven't seen this failure.

jimfulton

I guess the rationale for the fix is that subscriber client state was leaking between tests and causing spurious failures.

plamut requested a review from a team as a code owner April 14, 2021 13:12

google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Apr 14, 2021

product-auto-label bot added the api: pubsub Issues related to the googleapis/python-pubsub API. label Apr 14, 2021

plamut requested review from pradn and jimfulton April 14, 2021 13:31

jimfulton approved these changes Apr 15, 2021

View reviewed changes

plamut merged commit 13a6686 into googleapis:master Apr 15, 2021

plamut deleted the iss-372 branch April 15, 2021 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: fix flaky test for blocking pull shutdown #378

test: fix flaky test for blocking pull shutdown #378

plamut commented Apr 14, 2021 •

edited

Loading

plamut commented Apr 14, 2021

jimfulton commented Apr 15, 2021

jimfulton left a comment

test: fix flaky test for blocking pull shutdown #378

test: fix flaky test for blocking pull shutdown #378

Conversation

plamut commented Apr 14, 2021 • edited Loading

Steps to verify

plamut commented Apr 14, 2021

jimfulton commented Apr 15, 2021

jimfulton left a comment

Choose a reason for hiding this comment

plamut commented Apr 14, 2021 •

edited

Loading