Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amend linux cleanup script with full minikube, docker and kvm cleanup #9726

Merged
merged 4 commits into from
Dec 11, 2020

Conversation

prezha
Copy link
Contributor

@prezha prezha commented Nov 17, 2020

fixes: #9666

this pr adds kvm leftovers cleanup to hack/jenkins/cron/cleanup_and_reboot_Linux.sh - specifically: kvm domains, pools, networks (except 'default'), and finally host's kvm-related network interfaces


note

with all such "thorough" (more details below) hourly minikube/kvm/docker cleanups, we probably don't need also to reboot the host every hour (and jenkins being unavailable until the host is operable again) as we do now:

sudo install cron/cleanup_and_reboot_Linux.sh /etc/cron.hourly/cleanup_and_reboot || echo "FAILED TO INSTALL CLEANUP"

apt update -y && apt upgrade -y && reboot

and so, instead of hourly, a dayily or even a weekly update + reboot would probably be ok (effectively extracting that to a separate cron job) - these are just thoughts/proposal, i haven't made these changes here


as after kvm cleanup any existing minikube cluster would become unusable, i've also added full minikube (for each user account) cleanup before kvm, and then again, we could only benefit from the full docker cleanup, so i've amended that one as well
ie, on our jenkins test servers, we are currently using (partial) docker cleanup:

# clean docker left overs
docker rm -f -v $(docker ps -aq) >/dev/null 2>&1 || true
docker volume prune -f || true
docker volume ls || true
docker system df || true

# clean docker left overs
docker rm -f -v $(docker ps -aq) >/dev/null 2>&1 || true
docker volume prune -f || true
docker volume ls || true
docker system df || true

docker rm -f -v $(docker ps -aq) >/dev/null 2>&1 || true
docker volume prune -f || true
docker system df || true

docker kill $(docker ps -q) || true
docker rm $(docker ps -aq) || true

# removing possible left over docker containers from previous runs
docker rm -f -v $(docker ps -aq) >/dev/null 2>&1 || true

# kubeadm reset may not stop pods immediately
docker rm -f $(docker ps -aq) >/dev/null 2>&1 || true

here i've amended hack/jenkins/cron/cleanup_and_reboot_Linux.sh to use:

docker kill $(docker ps -aq) >/dev/null 2>&1
docker system prune --all --volumes --force

this would kill all running containers and then remove:

  - all stopped containers
  - all networks not used by at least one container
  - all volumes not used by at least one container
  - all images without at least one container associated to them
  - all build cache

example output:

$ minikube profile list
|----------|-----------|---------|----------------|------|---------|---------|
| Profile  | VM Driver | Runtime |       IP       | Port | Version | Status  |
|----------|-----------|---------|----------------|------|---------|---------|
| minikube | docker    | docker  | 192.168.49.2   | 8443 | v1.19.4 | Running |
| mkc2-kvm | kvm2      | docker  | 192.168.39.223 | 8443 | v1.19.4 | Running |
| mkc3     | docker    | docker  | 192.168.59.2   | 8443 | v1.19.4 | Running |
| mkc4-kvm | kvm2      | docker  | 192.168.39.39  | 8443 | v1.19.4 | Running |
|----------|-----------|---------|----------------|------|---------|---------|
$ hack/jenkins/cron/cleanup_and_reboot_Linux.sh 
                                                                               
Broadcast message from admin@ip-172-31-23-85 (pts/0) (Tue Nov 17 14:50:12 2020)
                                                                               
cleanup_and_reboot running - may shutdown in 60 seconds                        
                                                                               
java: no process found

cleanup minikube...
successfully cleaned up minikube for admin user

cleanup docker...
Deleted Images:
untagged: gcr.io/k8s-minikube/kicbase:v0.0.14
untagged: gcr.io/k8s-minikube/kicbase@sha256:2bd97b482faf5b6a403ac39dd5e7c6fe2006425c6663a12f94f64f5f81a7787e
deleted: sha256:7ed8827b36a5092c654640afc56410f6f25f6dca005a46d458ed9949dce0ab88
deleted: sha256:76edfcc88c70fd0959089a1a56621676ce0ffdce23ae5f2ccbb156036a78ce7c
deleted: sha256:e8fe8f6adc70a5915fb3963efe934976273f4454e99689908a074c71f28388af
deleted: sha256:e443ae0b86ad56288bf66c8f42dfcaa71ed9488f2ecb4f27a294a57b32867269
deleted: sha256:abbfbd58d04aacf312c8f8c2fd6659a776d8d75c869b626cc7fbc5bb6d9b43a3
deleted: sha256:91ba347e0b6b72c205eff689d9bc451f0054d31da12ba8f45d53f98f4838beb5
deleted: sha256:13dac4fa1833e3496daa0b8086b679f759f3172065b8d1fc80f46accf35e9951
deleted: sha256:fec5aae2747de2ae6e977f4378909fa6cf291984d7119c78517150625b0df7a4
deleted: sha256:3b6794511231f829644dfc0c4fd501a759971710f7323dd5c666056b780466ed
deleted: sha256:34d685de15e59d8e7b60db03c612c2382af8e0519c3126f7ad559d361d7ca224
deleted: sha256:0576187f88ba3423ed65f02e0cb11607045aa3396fd2390e143ec178a66aa491
deleted: sha256:d31c6e8f441b14f26eeb2566a6ff64f4fac95777679f050cdf4c37c345ab2c63
deleted: sha256:1c361b8963e9f4d9d95b88c9141b8f4fe3eb3530df0dd5abda19023266cb2485
deleted: sha256:e317576cf770e23cfb8ca78fd53c2a1ab7d1a1266d8590f73f00a6a73619387b
deleted: sha256:852913342e6bca67305f62cb879abe63a28c65d1d7ff76f969678ca0d92bb0af
deleted: sha256:fadfd8005e865be563ea9b8f0250a1e49a89c6fb37f716af22cc7cdc40755b91
deleted: sha256:62b3c2a26918496d9a08b4216a61c38491df16e02e1cc8506c692e27eab2cd22
deleted: sha256:fca53b0998bce2282d795dabe1f97a45793ffbabf41cd49ffa500ef0209b0fc5
deleted: sha256:f0c68caced3e899feab55f752cb7640440654c8669494965c91d51e355cbef57
deleted: sha256:8b2dd06f3f68e9b5134352b6002a7964bbe6cdb99b9fe1926549ef511d9f89d7
deleted: sha256:58cd4dde00d4dfe056a99d6d3775795704e0a2b29bbf324f62d4970e49370860
deleted: sha256:6c96639aeb5b9f81527aedaff6619daa3d82972230dd4bf5a4074a52247e67ee
deleted: sha256:ffb0fc0bfaee5b0e7e118694ee790a4264349547772cf2362f8ca299b2f98eb3
deleted: sha256:46062b9cae8daf4570ed5193acb3d8c0f923276c7a2580b49e4dfa1f479dd6e9
deleted: sha256:3fd3f25243ab56e6ad5da0c43715780e0ec7b80d0975a787d0359cfa0c9521bb
deleted: sha256:b7d8707d770f12bc1367ed04a0d922ce7a403914354682c808cff174b00aa4d6
deleted: sha256:69ea0ba6086b4837bc259353a9dec7e6f7bcc9b8297b0f722387a114697e5691
deleted: sha256:923b52e8276c042a8602849149a284ae77cccf4c688cc4284bf01ec9669a6e6c
deleted: sha256:d42a4fdf4b2ae8662ff2ca1b695eae571c652a62973c1beb81a296a4f4263d92

Total reclaimed space: 876.3MB

cleanup kvm...

before the cleanup:

 - KVM domains:
 Id    Name                           State
----------------------------------------------------

 - KVM pools:
 Name                 State      Autostart
-------------------------------------------

 - KVM networks:
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              active     yes           yes

 - host networks:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 0a:e3:0e:b7:06:f7 brd ff:ff:ff:ff:ff:ff
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:47:c5:b3 brd ff:ff:ff:ff:ff:ff
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:47:c5:b3 brd ff:ff:ff:ff:ff:ff
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
    link/ether 02:42:d4:2b:ac:6b brd ff:ff:ff:ff:ff:ff
bridge connected to the 'default' KVM network to leave alone: virbr0

after the cleanup:

 - KVM domains:
 Id    Name                           State
----------------------------------------------------

 - KVM pools:
 Name                 State      Autostart
-------------------------------------------

 - KVM networks:
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              active     yes           yes

 - host networks:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 0a:e3:0e:b7:06:f7 brd ff:ff:ff:ff:ff:ff
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:47:c5:b3 brd ff:ff:ff:ff:ff:ff
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:47:c5:b3 brd ff:ff:ff:ff:ff:ff
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
    link/ether 02:42:d4:2b:ac:6b brd ff:ff:ff:ff:ff:ff

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 17, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @prezha. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 17, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: prezha
To complete the pull request process, please assign priyawadhwa after the PR has been reviewed.
You can assign the PR to them by writing /assign @priyawadhwa in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 17, 2020
@minikube-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@prezha prezha requested a review from medyagh November 17, 2020 23:53
@priyawadhwa
Copy link

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 2, 2020
@@ -36,11 +36,83 @@ logger "cleanup_and_reboot is happening!"
# kill jenkins to avoid an incoming request
killall java

# clean minikube left overs
echo -e "\ncleanup minikube..."
for user in $(lslogins --user-accs --noheadings --output=USER); do

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can assume only jenkins and root (for the none driver) as users here here instead of using the for loop across all users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i've hardcoded only jenkins and root user as targets for the cleanup

@minikube-pr-bot
Copy link

kvm2 Driver
error collecting results for kvm2 driver: timing run 0 with Minikube (PR 9726): timing cmd: [/home/performance-monitor/.minikube/minikube-binaries/9726/minikube start --driver=kvm2]: starting cmd: fork/exec /home/performance-monitor/.minikube/minikube-binaries/9726/minikube: exec format error
docker Driver
error collecting results for docker driver: timing run 0 with Minikube (PR 9726): timing cmd: [/home/performance-monitor/.minikube/minikube-binaries/9726/minikube start --driver=docker]: starting cmd: fork/exec /home/performance-monitor/.minikube/minikube-binaries/9726/minikube: exec format error

@prezha prezha requested a review from priyawadhwa December 4, 2020 15:20
@minikube-pr-bot
Copy link

kvm2 Driver
Times for minikube: 74.7s 69.0s 72.0s
Average time for minikube: 71.9s

Times for Minikube (PR 9726): 67.3s 67.7s 68.5s
Average time for Minikube (PR 9726): 67.8s

Averages Time Per Log

+--------------------------------+----------+--------------------+
|              LOG               | MINIKUBE | MINIKUBE (PR 9726) |
+--------------------------------+----------+--------------------+
| * minikube v1.15.1 on Debian   | 0.1s     | 0.0s               |
|                           9.11 |          |                    |
| * Using the kvm2 driver based  | 0.0s     | 0.0s               |
| on user configuration          |          |                    |
| * Starting control plane node  | 0.0s     | 0.0s               |
| minikube in cluster minikube   |          |                    |
| * Creating kvm2 VM (CPUs=2,    | 42.4s    | 38.9s              |
| Memory=3700MB, Disk=20000MB)   |          |                    |
| ...                            |          |                    |
| * Preparing Kubernetes v1.19.4 | 26.7s    | 26.1s              |
| on Docker 19.03.13 ...         |          |                    |
| * Verifying Kubernetes         | 2.3s     | 2.5s               |
| components...                  |          |                    |
| * Enabled addons:              | 0.5s     | 0.2s               |
| storage-provisioner,           |          |                    |
| default-storageclass           |          |                    |
|                                | 0.0s     | 0.0s               |
|   - Want kubectl v1.19.4? Try  |          |                    |
| 'minikube kubectl -- get pods  |          |                    |
| -A'                            |          |                    |
| * Done! kubectl is now         |          |                    |
| configured to use "minikube"   |          |                    |
| cluster and "default"          |          |                    |
| namespace by default           |          |                    |
+--------------------------------+----------+--------------------+

docker Driver
Times for minikube: 32.7s 31.9s 31.6s
Average time for minikube: 32.1s

Times for Minikube (PR 9726): 30.2s 30.9s 29.2s
Average time for Minikube (PR 9726): 30.1s

Averages Time Per Log

+--------------------------------+----------+--------------------+
|              LOG               | MINIKUBE | MINIKUBE (PR 9726) |
+--------------------------------+----------+--------------------+
| * minikube v1.15.1 on Debian   | 0.2s     | 0.2s               |
|                           9.11 |          |                    |
| * Using the docker driver      | 0.1s     | 0.1s               |
| based on user configuration    |          |                    |
| * Starting control plane node  | 0.1s     | 0.1s               |
| minikube in cluster minikube   |          |                    |
| * Creating docker container    | 9.7s     | 9.5s               |
| (CPUs=2, Memory=3700MB) ...    |          |                    |
| * Preparing Kubernetes v1.19.4 | 20.5s    | 18.6s              |
| on Docker 19.03.13 ...         |          |                    |
| * Verifying Kubernetes         | 1.3s     | 1.4s               |
| components...                  |          |                    |
| * Enabled addons:              | 0.1s     | 0.3s               |
| storage-provisioner,           |          |                    |
| default-storageclass           |          |                    |
|                                | 0.0s     | 0.0s               |
|   - Want kubectl v1.19.4? Try  |          |                    |
| 'minikube kubectl -- get pods  |          |                    |
| -A'                            |          |                    |
| * Done! kubectl is now         |          |                    |
| configured to use "minikube"   |          |                    |
| cluster and "default"          |          |                    |
| namespace by default           |          |                    |
+--------------------------------+----------+--------------------+

@sharifelgamal
Copy link
Collaborator

Any change that results in not needing to restart our jenkins machines is a welcome one to me.

@medyagh medyagh merged commit b84f249 into kubernetes:master Dec 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvm: recommend solution & auto-cleanup minikube-net network leftover failures
7 participants