-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recreate pod sandbox when the sandbox does not have an IP address. #48970
Recreate pod sandbox when the sandbox does not have an IP address. #48970
Conversation
/sig network |
I've run a few tests with this, but haven't yet been able to reproduce the issue this was intended to fix (which was occurring pretty reliably on v1.7.0). Still want to do some more runs to see if I can repro. |
/test pull-kubernetes-kubemark-e2e-gce |
Huh, kubemark tests seem to be failing due to OAuth issues:
|
/retest |
@Random-Liu any chance you could take a look and see if you think this approach makes any sense? |
@kubernetes/sig-network-pr-reviews ^ |
/retest |
/test pull-kubernetes-bazel-test |
@caseydavenport I want to get to a place where the runtime tracks whether the container was fully set up or not (including networking), and I guess they'd have to checkpoint that state somewhere too and validate on GetPodStatus(). I feel like your solution here is the best thing we can do now, but perhaps could we add a FIXME that it should be pushed to the runtimes to correctly report whether the pod was fully set up or not? |
/lgtm |
fded16d
to
152c974
Compare
/lgtm |
/retest |
The fix LGTM. @caseydavenport Could you fix the unit tests? |
152c974
to
e435bc3
Compare
/retest |
I believe I've fixed the UTs (plus added a new one for this fix). The last round of failures appeared to be unrelated at a glance. |
/retest |
e435bc3
to
94bf2b0
Compare
/lgtm |
+1. This is fixing a v1.8 issues. ping @Random-Liu for approval. |
/kind bug @jdumars @kubernetes/sig-release-members Can you please add the milestone 1.8 for this bug? |
@@ -396,6 +396,12 @@ func (m *kubeGenericRuntimeManager) podSandboxChanged(pod *v1.Pod, podStatus *ku | |||
return true, sandboxStatus.Metadata.Attempt + 1, "" | |||
} | |||
|
|||
// Needs to create a new sandbox when the sandbox does not have an IP address. | |||
if !kubecontainer.IsHostNetworkPod(pod) && sandboxStatus.Network.Ip == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know why IP address is not allocated for the pod? My concern is that we paper-over the real issue here. But I guess this is best solution we could come up so far, I am ok with it as a temporary workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know of at least one scenario, which this PR is meant to fix - it's a condition in which the kubelet gets restarted during CNI execution and so the sandbox exists but doesn't yet have an IP address.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: caseydavenport, dcbw, dchen1107 Associated issue: 48510 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
/retest Review the full test history for this PR. |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue (batch tested with PRs 48970, 52497, 51367, 52549, 52541). If you want to cherry-pick this change to another branch, please follow the instructions here.. |
Automatic merge from submit-queue (batch tested with PRs 16889, 16865). UPSTREAM: 53857: kubelet sync pod throws more detailed events Also includes the following upstream dependant PRs: UPSTREAM: 50350: Wait for container cleanup before deletion UPSTREAM: 48970: Recreate pod sandbox when the sandbox does not have an IP address. UPSTREAM: 48589: When faild create pod sandbox record event. UPSTREAM: 48584: Move event type UPSTREAM: 47599: Rerun init containers when the pod needs to be restarted xrefs: kubernetes/kubernetes#53857 kubernetes/kubernetes#50350 kubernetes/kubernetes#48970 kubernetes/kubernetes#48589 kubernetes/kubernetes#48584 kubernetes/kubernetes#47599
What this PR does / why we need it:
Attempts to fix a bug where Pods do not receive networking when the kubelet restarts during pod creation.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged):fixes # #48510
Release note: