Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of the change:
This fixes two flaky unit tests
TestResolver
andTestUpdates
.First,
TestResolver
, specifically theFailForwardEnabled/3EntryReplacementChain/ReplacementChainBroken/NotSatisfiable
test case.This test was changed in #2788 and currently references one of the three failed CSVs in the test namespace.
However, when you run this test multiple times, the resolver cache is sometimes handing back a different CSV (the following was a purposefully broken match in order to always print the test result):
Note how sometimes the resolver is returning
catsrc-namespace/a.v1
and sometimes it'scatsrc-namespace/a.v2
.As I understand this test, the specific CSV doesn't matter for the error as all three are in the
CSVPhaseFailed
state. Therefore, my proposed fix removes the CSV name from the error match, so that it will match on any of the three CSVs. I believe this was the original intent of the test, looking at what it was prior to #2788 (noting that the error matching was split in two, based on the same omission of the CSV name).Second,
TestUpdates
. In this piece of code:operator-lifecycle-manager/pkg/controller/operators/olm/operator_test.go
Lines 3925 to 3929 in 6ffec4d
The for loop will keep hammering the fake operator with
op.syncClusterServiceVersion(csv)
andGet
calls as fast as it possibly can. However, this is creating a race condition in the*RaceFreeFakeWatcher
that is watching the fakeOperatorGroup
. Basically the watch channel is filling up (default watch channel length is 100 events, and if it fills up and closes, the go routine panics) more quickly than the operator can drain it.The proposed fix here creates a sleep of 1ms between instances of this for loop. It only slows the test down negligibly, but it's enough to help the watch channel drain faster than it fills. In a real world Kubernetes API server, responses aren't going to be that fast anyways.
Motivation for the change:
Fix flaky unit tests, because they are the worst.
Architectural changes:
Testing remarks:
With fix, ran test 100 times to verify flake is gone:
This compared to the current HEAD, which this test fails somewhere between 10-20% of the time.
Reviewer Checklist
/doc
[FLAKE]
are truly flaky and have an issue