Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pubsub system test: test_fetch_delete_subscription_w_deleted_topic does not see orphan w/o topic #2080

Closed
tseaver opened this issue Aug 9, 2016 · 22 comments
Assignees
Labels
api: pubsub Issues related to the Pub/Sub API. testing

Comments

@tseaver
Copy link
Contributor

tseaver commented Aug 9, 2016

From: https://travis-ci.org/GoogleCloudPlatform/gcloud-python/builds/151011241#L822

======================================================================
FAIL: test_fetch_delete_subscription_w_deleted_topic (pubsub.TestPubsub)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/GoogleCloudPlatform/gcloud-python/system_tests/pubsub.py", line 273, in test_fetch_delete_subscription_w_deleted_topic
    self.assertTrue(orphaned.topic is None)
AssertionError: False is not true
----------------------------------------------------------------------
@tseaver tseaver added testing api: pubsub Issues related to the Pub/Sub API. labels Aug 9, 2016
@dhermes dhermes added the flaky label Aug 9, 2016
@dhermes
Copy link
Contributor

dhermes commented Aug 9, 2016

@tseaver I just created the flaky label

@tseaver
Copy link
Contributor Author

tseaver commented Aug 11, 2016

Again yesterday.

@tseaver
Copy link
Contributor Author

tseaver commented Aug 16, 2016

Still failing on one of two runs after the merge of #2086.

@tseaver
Copy link
Contributor Author

tseaver commented Aug 17, 2016

@tseaver tseaver closed this as completed Aug 17, 2016
@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2016

w00t!

@tseaver
Copy link
Contributor Author

tseaver commented Aug 28, 2016

Ugh

@tseaver tseaver reopened this Aug 28, 2016
@tseaver
Copy link
Contributor Author

tseaver commented Aug 28, 2016

@tmatsuo Has there been some recently-deployed change to the Pubsub API which would cause this failure to go from "flaky" to "fails routinely"? Our current backoff ends up polling for four minutes, trying to get the deleted-topic response after deleting our "orphan" subscription's topic.

@dhermes, @daspecster If @tmatsuo doesn't know / suspect any back-end change which would explain the failures, I propose we drop this particular testcase: we have unittests in place which cover the oddball case, and we need to have our build stay green.

@dhermes
Copy link
Contributor

dhermes commented Aug 29, 2016

I propose we drop this particular testcase

I'm unclear on how important this test case is, though hope we can sort out the backend's issues without having to ditch the system test (the more the merrier).

@dhermes
Copy link
Contributor

dhermes commented Sep 2, 2016

@tseaver shouldn't we leave this open?

@dhermes
Copy link
Contributor

dhermes commented Sep 2, 2016

@tseaver As of the latest build the error has stopped occurring, which has led to "unexpected success". To make this more hilarious, in Python 2 vs. 3, that means a different status code with unittest.

I've tried to repro the FAILED (unexpected successes=1) vs. OK (unexpected successes=1) but can't

import unittest


class TestNothing(unittest.TestCase):

    @unittest.expectedFailure
    def test_it(self):
        self.assertTrue(False)


def main():
    suite = unittest.TestSuite()
    tests = unittest.defaultTestLoader.loadTestsFromTestCase(
        TestNothing)
    suite.addTest(tests)
    test_result = unittest.TextTestRunner(verbosity=2).run(suite)
    print('test_result.wasSuccessful():')
    print(test_result.wasSuccessful())


if __name__ == '__main__':
    main()

@dhermes
Copy link
Contributor

dhermes commented Sep 3, 2016

Same fail but not really fail again: https://travis-ci.org/GoogleCloudPlatform/gcloud-python/builds/157238601

@dhermes dhermes reopened this Sep 3, 2016
@tseaver
Copy link
Contributor Author

tseaver commented Sep 6, 2016

I'm in favor of just dropping the test altogether (as #2215 did originally).

@tmatsuo
Copy link
Contributor

tmatsuo commented Sep 6, 2016

Sorry for the late reply. What does the test do? Maybe I can ask a core engineer if something has changed.

@tseaver
Copy link
Contributor Author

tseaver commented Sep 6, 2016

@tmatsuo it creates a topic, and a subscription to it, then deletes the topic. It then tries to verify that the same subscription (when returned from Client.list_subscriptions) has None as its topic, i.e., that resource as listed contained the special _deleted-topic_ value for its topic.

What we see is that the orphaned subscription is not showing that special value, even after an extremely long retry/backoff cycle (up to four minutes or so).

@tmatsuo
Copy link
Contributor

tmatsuo commented Sep 6, 2016

Does it always fail now, or it is still flaky (flakier than before)?

@dhermes
Copy link
Contributor

dhermes commented Sep 6, 2016

It failed for like 10 days straight and now has succeeded for about 5 days straight

@tmatsuo
Copy link
Contributor

tmatsuo commented Sep 6, 2016

Thanks, asked the engineering team about it

@tseaver
Copy link
Contributor Author

tseaver commented Sep 7, 2016

If we merge #2248, we can still have the back-end guys experiment with the test by setting the appropriate environment variable. E.g.:

$ git clone git@github.com:GoogleCloudPlatform/google-cloud-python
$ cd google-cloud-python
$ GOOGLE_CLOUD_RUN_FLAKY_TESTS=1 tox -e system-tests

@dhermes
Copy link
Contributor

dhermes commented Sep 11, 2016

I did some sleuthing and found out why the unexpected success causes a failure in Python 3 but not Python 2.

def wasSuccessful(self):
    return len(self.failures) == len(self.errors) == 0

became

def wasSuccessful(self):
    return ((len(self.failures) == len(self.errors) == 0) and
            (not hasattr(self, 'unexpectedSuccesses') or
             len(self.unexpectedSuccesses) == 0))

@dhermes
Copy link
Contributor

dhermes commented Sep 11, 2016

Potential "fix"

diff --git a/system_tests/run_system_test.py b/system_tests/run_system_test.py
index 4ee50f9..4bb752c 100644
--- a/system_tests/run_system_test.py
+++ b/system_tests/run_system_test.py
@@ -70,6 +70,8 @@ def run_module_tests(module_name, ignore_requirements=False):

     # Run tests.
     test_result = unittest.TextTestRunner(verbosity=2).run(suite)
+    # NOTE: Erase the unexpected successes.
+    test_result.unexpectedSuccesses[:] = []
     # Exit if not successful.
     if not test_result.wasSuccessful():
         raise FailedSystemTestModule(module_name)

Though it seems like the test succeeds to frequently now we should just remove @unittest.expectedFailure.

@dhermes
Copy link
Contributor

dhermes commented Sep 12, 2016

When running .tox/system-tests/bin/nosetests system_tests/pubsub.py I was confused by

/usr/lib/python2.7/unittest/case.py:340: RuntimeWarning: TestResult has no addExpectedFailure method, reporting as passes
  RuntimeWarning)

since unittest.TestResults implements addExpectedFailure. This was due to nose-devs/nose#33, so I feel somewhat happy with the move to py.test, where issues won't live on unfixed for 5 years. (Thanks @jonparrott for pointing out the stagnancy of nose.)

@tseaver
Copy link
Contributor Author

tseaver commented Sep 16, 2016

PR #2248 deleted the flaky test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the Pub/Sub API. testing
Projects
None yet
Development

No branches or pull requests

3 participants