Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolve handling of RootFsInfo() error on kubelet start #19948

Closed
liggitt opened this issue Jun 8, 2018 · 8 comments
Closed

resolve handling of RootFsInfo() error on kubelet start #19948

liggitt opened this issue Jun 8, 2018 · 8 comments
Assignees
Labels
kind/post-rebase lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@liggitt
Copy link
Contributor

liggitt commented Jun 8, 2018

follow-up from #19137

b61c00c made the RootFsInfo() lookup failure on kubelet startup non-fatal because it was blocking CI (it fails when run on tmpfs/bindmounts I think)

need to determine the impact of continuing the kubelet even without the EphemeralStorageCapacityFromFsInfo results

@sjenning
Copy link
Contributor

@sjenning
Copy link
Contributor

@liggitt since b61c00c is already in 3.11 and the upstream PR kubernetes/kubernetes#65595 does roughly the same thing do we want to

  1. revert the drop comment and pick the upstream one for 3.11 or
  2. stay for 3.11 and get the upstream fix in the 3.12 rebase

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Jun 29, 2018
Automatic merge from submit-queue (batch tested with PRs 60150, 65467, 65487, 65595, 65374). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubelet: feature gate LSI capacity calculation

Currently if `cm.cadvisorInterface.RootFsInfo()` fails, the whole kubelet bails.  If `/var/lib/kubelet` is on a tmpfs or bindmount, this can happen (this is the case for some of our CI envs openshift/origin#19948).

We would be able to workaround this, in the short term, by disabling the LSI feature gate if the capacity calculate was protected by the gate, but currently it isn't.

This PR adds the gate check around setting the ephemeral storage capacity.

@liggitt @derekwaynecarr @dashpole 

It might be a different discussion about whether or not this should be fatal.  If it isn't fatal, seems that it would just prevent pods that had a ephemeral storage request from being scheduled.

/sig node
@liggitt
Copy link
Contributor Author

liggitt commented Jun 29, 2018

if we don't plan to enable the feature in 3.11, it's largely moot. if we do, then we should revert/pick that PR, right?

@sjenning
Copy link
Contributor

agreed. I'll keep this in mind if we decide to go forward with LSI in 3.11 (which would be a stretch, I think)

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 27, 2018
@openshift-merge-robot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 27, 2018
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/post-rebase lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants