Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvm: recommend solution & auto-cleanup minikube-net network leftover failures #9666

Closed
prezha opened this issue Nov 11, 2020 · 2 comments · Fixed by #9726
Closed

kvm: recommend solution & auto-cleanup minikube-net network leftover failures #9666

prezha opened this issue Nov 11, 2020 · 2 comments · Fixed by #9726
Assignees
Labels
co/kvm2-driver KVM2 driver related issues kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@prezha
Copy link
Contributor

prezha commented Nov 11, 2020

this is a followup on #9641 - additional improvements suggested by @priyawadhwa & @medyagh :

  1. If we detect that we can't activate the network because of user permissions, print out a smart error and the commands a user would need to run to fix the issue (kvm2: minikube start fails after host reboot: "network 'minikube-net' is not active' #2513 (comment) worked for me)
  2. Add a force cleanup of minikube-net to our jenkins script at the start of every integration test, which basically just runs those commands, so that we don't have to constantly manually fix the VM. Should go in this file: https://github.com/kubernetes/minikube/blob/master/hack/jenkins/linux_integration_tests_kvm.sh

ref: #9641 (comment)

@prezha
Copy link
Contributor Author

prezha commented Nov 11, 2020

/assign

@prezha
Copy link
Contributor Author

prezha commented Nov 17, 2020

re: 2nd point - in pr #9726, i've added kvm leftovers cleanup in addition to minikube and docker full cleanup - to a more complete linux test host overall cleanup

re: 1st point - smart error and showing advise to a user in case we can't activate the network (ie, Network is already in use by interface error):

interesting thing about br-f912846614e5 specifically is that it showed up (mentioned) as part of the Network is already in use by interface error on our test server:

  1. on 29th Oct: Figure out KVM failures in jenkins on Debian 10 #8952 (comment)
  2. on 05th Nov: Reuse minikube-net network if it already exists for kvm driver #9610 (comment)
  3. on 09th Nov: kvm: recover from minikube-net network left over failures  #9641 (comment)
    here, cleanup introduced in pr amend linux cleanup script with full minikube, docker and kvm cleanup #9726 should help automatically prevent that from spanning over more than an hour (instead of weeks)

while we also have another example: #9049 (comment) mentioning different interface br-487e1305fa05

from what i saw while playing with debian9 (that we use on test servers), i think that, oddly enough, br-xxx network bridge (and their veth* 'slave') interfaces are likely created by docker and not by libvirt (that uses virbr* and vnet* naming for interfaces)

therefore:

  • atm, i'm a bit reluctant to propose to a user to destroy interface that could belong to docker in order to fix an issue with kvm(?!)
    • if we would want to do that anyway, we could extract the 'offending' iface from libvirt error, and here i'd propose:
      • not to advise using ifconfig as it is mostly deprecated and not installed by default anymore; similarly, brctl is not installed by default
      • advise a single sudo ip link delete <iface> as: a) ip tools are installed by default and b) ip does not care if the link-to-delete is up or down,
        followed by sudo virsh net-start minikube-net
  • instead, since this issue is happening rarely in general (mostly reported on our test servers anyways? whereas pr amend linux cleanup script with full minikube, docker and kvm cleanup #9726 should help with that) and i could not replicate it, i would propose to add more logging around this issue and then try to better understand why and what exactly is happening

please share your thoughts

@sharifelgamal sharifelgamal added co/kvm2-driver KVM2 driver related issues kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/kvm2-driver KVM2 driver related issues kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants