-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop eating panics #28800
Stop eating panics #28800
Conversation
looks like some tests need attention. |
lgtm other than fixing the unit/integration tests. |
Hm, I'm not sure if we want to turn these off for kubelet. @yujuhong, would anything bad happen if kubelet started actually crashing when it panics? |
Uh...you can say that in most of the cluster setup, there will be some babysitter daemon restarting kubelet. I am not sure if it's true for all setup though. On the other hand, I've seen kubelet panic'd and then the panic got handled by the util, leading to starting multiple instances of the same components. It may be safer/easier to debug to let kubelet crash. |
OK, this should make the tests pass. |
Thanks, @yujuhong -- I think I'll just disable it everywhere like I originally planned then. fyi @smarterclayton in case this affects OpenShift |
/cc @kubernetes/sig-node After this PR is merged, if kubelet panics, it will crash. |
@@ -42,6 +50,8 @@ func HandleCrash(additionalHandlers ...func(interface{})) { | |||
for _, fn := range additionalHandlers { | |||
fn(r) | |||
} | |||
// Actually proceed to panic. | |||
panic(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's terrifying.
Somewhat of a big change. Doesn't this mean any panic someone can reproduce can be a denial of service attack against any component of the infra? I do like making these more visible, but for API servers exposed to multi-tenant users this is riskier. Can we flip the default so that someone can disable this (get the old behavior) if they disagree? |
8d249c5
to
c8724f5
Compare
Updated. ReallyCrash boolean works again. |
Thanks, looks good to me aside from whatever the failures are in mesos tests. |
I'm pretty sure the remaining mesos test error is actually a legit test error that was being papered over by us eating this panic. @kubernetes/kube-mesos @karlkfi Shall I disable the test until you have time to look at it? Or can you suggest a fix? |
Thanks @jdef. Test disabled. |
Removing label |
GCE e2e build/test passed for commit 78c02cd. |
all the tests pass, I'm going to interpret Robert & Clayton's comments as a collective LGTM. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
Yes, it was LGTM On Wed, Jul 13, 2016 at 4:29 PM, k8s-merge-robot notifications@github.com
|
GCE e2e build/test passed for commit 78c02cd. |
Automatic merge from submit-queue |
Did you want to offer a cherry-pick for next cut? |
@lavalamp Ping. Could you create a cherrypick PR? |
I'm... concerned about pushing this into prod, since apparently controller manager deletes and recreates a bunch of things it ought not when it restarts. |
I'm ok waiting for this to land in the 1.3 branch after 1.3.3 to give it a longer bake time (both at head and in the release branch). |
Since 1.4 is close to release, I guess we won't be taking this on the 1.3 branch, so removing the cherrypick labels. |
@lavalamp @roberthbailey this PR has a release-note-action-required label on it, but I can't tell that the action is something an end-user of kubernetes can do, outside of kubelet's can you clarify whether this is worth calling out as an action required in the 1.4.0 release notes? |
@spiffxp the change message says:
This is the action that is needed. |
The action is targeted at cluster admins. Most action-required items are, I think. End users of kubernetes should not need to take many actions. |
@lavalamp agree on clusters ops audience (that's what I meant when I said end-user, doh), hence my confusion, skipped over that first line, thanks for pointing me in the right direction |
Fixes #28365
This change is