Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[internal/k8sconfig] Configure k8s library to not crash #9332

Merged
merged 1 commit into from
Apr 18, 2022

Conversation

djaglowski
Copy link
Member

This is not necessarily a root cause fix, but it may be sufficient to address crashes observed in test cases.

Resolves #6986
Resolves #9002
Resolves #9326


Explanation of changes

  1. The same error is presenting in various tests across both k8sclusterreceiver and k8seventreceiver.
  2. Both of these packages rely on internal/k8sconfig, which provides the client used by both receivers.
  3. The client creates a Controller which, when run, creates a Reflector that ultimately runs a very complicated and fragile looking function called ListAndWatch, which I believe is throwing panics in recoverable situations.
  4. The Controller defers a HandleCrash function, which is configurable using global settings. One is a ReallyCrash boolean which determines whether or not to crash or continue. The other is a slice of PanicHandlers, which by default contains one function that logs the panic stack trace.
  5. The changes in this PR tell the library not to not crash and not to log panics. These are not ideal settings, but I think they may be a starting point for more robust handling in these components. Perhaps a future PR will implement a PanicHandler that manages component status and if necessary resolves internal component state.

Also of interest, is that the stop channel provided to Controller.Run apparently does not work as intended.


The error is reproducible locally but does not occur frequently. On my machine, I see it less than 1/1000 test runs, but almost always within 5000, so the following reproduces the issue consistently for me: (cd receiver/k8seventsreceiver && go test -race -v -timeout 300s -count 5000 --tags="" ./...)

@djaglowski djaglowski added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Apr 15, 2022
@djaglowski djaglowski marked this pull request as ready for review April 15, 2022 20:51
@djaglowski djaglowski requested a review from a team April 15, 2022 20:51
@djaglowski djaglowski requested a review from dmitryax as a code owner April 15, 2022 20:51
Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can give it a try :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Skip Changelog PRs that do not require a CHANGELOG.md entry
Projects
None yet
3 participants