-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suppress test failures due to logs after tests complete #6067
Conversation
I am not fond of this, but some of our largest tests are quite flaky due to these logs, and that really isn't helping anything.
unfortunately not sure which test is producing these, AFAICT they're not testlogger-using, so I've just replaced everything I can find:
If this commit doesn't work, I'll have to see if I screwed up the implementation. gonna rerun a few times and check output, I believe I should see "bad" logs or it's probably just lacking evidence that it fixed anything. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted filessee 13 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
.... I think I'm going to sit on this for a while. I tried to construct a test that would show if I implemented this correctly, and it pretty randomly fails or succeeds in super duper confusing ways. Not sure if my Go is screwing up or blending caches or if this log-detecting is just fundamentally unreliable for some reason. edit: well. it's fundamentally unreliable. golang/go#67701 |
…flow#6067) "Fixes" test failures like this, by sweeping them under the rug: ``` 2024/05/28 21:39:29 ----- Done ----- 2024/05/28 21:39:29 Schema setup complete PASS coverage: 27.3% of statements in github.com/uber/cadence/client/..., github.com/uber/cadence/common/..., github.com/uber/cadence/host/..., github.com/uber/cadence/service/..., github.com/uber/cadence/tools/... panic: Log in goroutine after TestIntegrationSuite has completed: 2024-05-28T21:39:39.501Z DEBUG Selected default store shard for tasklist {"store-shard": "NonShardedStore", "wf-task-list-name": "9985f719-4b6a-4f0a-97c7-41a9e00d2414", "logging-call-at": "sharding_policy.go:100"} goroutine 72245 [running]: testing.(*common).logDepth(0xc0016ee680, {0xc00695ae00, 0xd5}, 0x3) /usr/local/go/src/testing/testing.go:1028 +0x6d4 testing.(*common).log(...) /usr/local/go/src/testing/testing.go:1010 testing.(*common).Logf(0xc0016ee680, {0x4a884b4, 0x2}, {0xc0021324b0, 0x1, 0x1}) /usr/local/go/src/testing/testing.go:1061 +0xa5 go.uber.org/zap/zaptest.testingWriter.Write({{0x662be10?, 0xc0016ee680?}, 0xa0?}, {0xc003ecf400, 0xd6, 0x400}) /go/pkg/mod/go.uber.org/zap@v1.13.0/zaptest/logger.go:130 +0x11e go.uber.org/zap/zapcore.(*ioCore).Write(0xc0030ad920, {0xff, {0xc18db1a6dddeef12, 0xa3b5c069bd, 0x8268f40}, {0x0, 0x0}, {0x4af6670, 0x29}, {0x0, ...}, ...}, ...) ... ``` When the test completes, it will simply log to stderr rather than the test logger: ``` ❯ go test -count 1 -v ./... === RUN TestLoggerShouldNotFailIfLoggedLate --- PASS: TestLoggerShouldNotFailIfLoggedLate (0.00s) PASS 2024-05-29T16:20:50.742-0500 INFO COULD FAIL TEST "TestLoggerShouldNotFailIfLoggedLate", logged too late: too late, orig{"logging-call-at": "testlogger_test.go:41", "log_stack": "github.com/uber/cadence/common/log/testlogger.(*fallbackTestCore).Write\n\t/User... 2024-05-29T16:20:50.742-0500 INFO COULD FAIL TEST "TestLoggerShouldNotFailIfLoggedLate", logged too late: too late, with{"actor-id": "testing", "logging-call-at": "testlogger_test.go:42", "log_stack": "github.com/uber/cadence/common/log/testlogger.(*fallbackTestCore).Write\n\t/User... ok github.com/uber/cadence/common/log/testlogger 0.586s ``` Ignoring the correctness part of the problem, this gives us the best of both worlds: - most logs are grouped by the test that produced them - logs produced due to incomplete shutdowns are still produced (as long as `-v` or some other test fails), but will not directly fail the test(s) I am not overly fond of this, but some of our largest tests are quite flaky due to these logs, and that really isn't helping anything. Seems like a reasonable tradeoff, and we can move to an opt-in model eventually instead of applying it everywhere like this PR does.
Fixing the flakiness that led to #1375. The races in these tests were due to `t.Log` calls occurring after the test finishes, because the workflow (and test suite and tests and...) does not wait for goroutines to shut down. It's an annoying enough issue that I tackled it with gusto in cadence-workflow/cadence#6067 and it's probably worth porting over here too. Though the underlying "shut down and do not wait" behavior is still extremely dangerous and needs to be fixed some day.
"Fixes" test failures like this, by sweeping them under the rug:
When the test completes, it will simply log to stderr rather than the test logger:
Ignoring the correctness part of the problem, this gives us the best of both worlds:
-v
or some other test fails), but will not directly fail the test(s)I am not overly fond of this, but some of our largest tests are quite flaky due to these logs, and that really isn't helping anything. Seems like a reasonable tradeoff, and we can move to an opt-in model eventually instead of applying it everywhere like this PR does.