[internal/k8sconfig] Configure k8s library to not crash #9332
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is not necessarily a root cause fix, but it may be sufficient to address crashes observed in test cases.
Resolves #6986
Resolves #9002
Resolves #9326
Explanation of changes
k8sclusterreceiver
andk8seventreceiver
.internal/k8sconfig
, which provides the client used by both receivers.Controller
which, when run, creates aReflector
that ultimately runs a very complicated and fragile looking function called ListAndWatch, which I believe is throwing panics in recoverable situations.Controller
defers aHandleCrash
function, which is configurable using global settings. One is aReallyCrash
boolean which determines whether or not to crash or continue. The other is a slice ofPanicHandlers
, which by default contains one function that logs the panic stack trace.PanicHandler
that manages component status and if necessary resolves internal component state.Also of interest, is that the stop channel provided to
Controller.Run
apparently does not work as intended.The error is reproducible locally but does not occur frequently. On my machine, I see it less than 1/1000 test runs, but almost always within 5000, so the following reproduces the issue consistently for me:
(cd receiver/k8seventsreceiver && go test -race -v -timeout 300s -count 5000 --tags="" ./...)