Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover from previous incomplete cluster creation (id_rsa: no such file or directory) #8824

Open
tstromberg opened this issue Jul 24, 2020 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@tstromberg
Copy link
Contributor

If container creation fails, as in #8814 - when we attempt to recreate the container to recover, we are unable, as we see that the container exists, and then fail because local SSH keys are missing:

! StartHost failed, but will try again: provision: Error getting config for native Go SSH: open /Users/tstromberg/.minikube/machines/stress8d5b4/id_rsa: no such file or directory

NOTE: --delete-on-failure does recover from this, but this is not the default.

Here's what my suggestion boils down to, roughly:

  • When starting up a cluster store a signal that the cluster stage is initializing. This may go along with Add transient states ("stopping", "starting") #8730.

  • During subsequent startups, check for the signal you've dropped. If it doesn't reflect a cluster that survived initialization, delete it by default. You could use the existence of SSH keys (for non-none drivers) as an initial or additional signal for this.

@tstromberg
Copy link
Contributor Author

BTW, this particular issue was seen twice: #8821 and another time. Logs for the second time follow:

recover.txt

@tstromberg tstromberg added july-chill kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed july-chill labels Jul 27, 2020
@tstromberg tstromberg added this to the v1.14.0-candidate milestone Jul 27, 2020
@medyagh
Copy link
Member

medyagh commented Sep 16, 2020

this probably have to move to important long term.

@medyagh medyagh added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Sep 16, 2020
@medyagh medyagh modified the milestones: v1.14.0, v1.15.0-candidate Oct 12, 2020
@priyawadhwa priyawadhwa removed this from the v1.15.0 milestone Oct 19, 2020
@priyawadhwa priyawadhwa removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Dec 28, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 28, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 27, 2021
@spowelljr spowelljr added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Apr 28, 2021
@sharifelgamal sharifelgamal removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 5, 2021
@spowelljr spowelljr added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Sep 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

7 participants