add new extra component to --wait=all to validate a healthy cluster #10424

prezha · 2021-02-09T20:24:23Z

i've slightly improved validateComponentHealth functional test, but in this pr i'm also proposing to change how we check if a pod is actually available -

currently, we rely on pod's Running status, and according to Pod phase
(ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase):

The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. The phase is not intended to be a comprehensive rollup of observations of container or Pod state, nor is it intended to be a comprehensive state machine.

The number and meanings of Pod phase values are tightly guarded. Other than what is documented here, nothing should be assumed about Pods that have a given phase value.

Running - The Pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting.

so it is prereq but not sufficient to consider it operational - we should use Pod conditions/Ready instead (ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions):

ContainersReady: all containers in the Pod are ready.
Initialized: all init containers have started successfully.
Ready: the Pod is able to serve requests and should be added to the load balancing pools of all matching Services.

i've changed accordingly how we wait for api server in kubeadm.WaitForNode as well as kverify.WaitForSystemPods - these changes should improve the odds for validateComponentHealth functional test not to fail, as well as overall stability

three additional notes:

depending on how it would be used, we can either reduce kverify.SystemPodsList (eg, remove currently redundant 'kube-apiserver' from the list -or- amend the kverify.*Components and remove APIServerWaitKey if SystemPodsWaitKey is defined, as it's using the SystemPodsList)
if this makes sense, we can revisit how we wait for api server in other places (currently using cmd.waitForAPIServerProcess in docker-env, and kverify.waitForAPIServerProcess in apiserver used in kubeadmin.restartControlPlane)
we are using healthz endpoint that is deprecated (ref: https://kubernetes.io/docs/reference/using-api/health-checks/#api-endpoints-for-health):

The Kubernetes API server provides 3 API endpoints (healthz, livez and readyz) to indicate the current status of the API server. 
The healthz endpoint is deprecated (since Kubernetes v1.16), and you should use the more specific livez and readyz endpoints instead

time metrics (all completed successfully - here are just resulting values for brevity)

before

❯ time minikube start --driver=docker
minikube start --driver=docker  9.03s user 3.60s system 14% cpu 1:27.75 total

after

❯ time minikube start --driver=docker
minikube start --driver=docker  10.54s user 4.22s system 8% cpu 2:52.82 total

❯ time minikube start --driver=docker --wait=all
minikube start --driver=docker --wait=all  10.19s user 3.95s system 7% cpu 2:58.58 total

❯ time minikube start --driver=docker --wait=none
minikube start --driver=docker --wait=none  9.66s user 3.93s system 15% cpu 1:28.81 total

expectedly, here the time increased for 'plain' minikube start - explanation:
default value for wait flag is kverify.DefaultWaitList: APIServerWaitKey, SystemPodsWaitKey, so, in the kubeadmin.WaitForNode it will trigger WaitForPodReadyByLabel for all system components (incl. apiserver) => currently, minikube start equals to minikube start --wait=all

on the other hand:

as in this pr, --wait=all will force wait for all system components to become fully operational
if --wait=none, then wait does not happen (no additional wait time is added): should we change the default value for --wait to none then, or we should trigger this strict_check only if explicitly asked for (and also with --all)?

one thing may be worth mentioning: just --wait flag, w/o specifying =xxx does not work as (i've) expected - ie, it's not taking the default value (kverify.DefaultWaitList), but it's 'greedy':

❯ minikube start --driver=docker --wait --alsologtostderr -v=99
...
W0209 16:24:12.243601   26050 start_flags.go:705] The value "--alsologtostderr" is invalid for --wait flag. valid options are "apiserver,system_pods,default_sa,apps_running,node_ready,kubelet"
...

(and that could be explained with the fact that = is optional anyway...)

k8s-ci-robot · 2021-02-09T20:24:31Z

Hi @prezha. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

minikube-bot · 2021-02-09T20:30:07Z

Can one of the admins verify this patch?

medyagh

this is good work, and I agree one thing to consider is,
we should ensure our current "minikube start" won't add additional wait for components
but we also want to make "minikube start --wait=all" to wait for every single thing possible (as purposed in this PR, we should not only rely on pod phase status but also wait for the them be actually running)

so if possible please post in the PR descriptions the time metrics Before and After this PR for "normal start" and then if it adds any new time we can introduce a new component to the wait flag

currently the available options for --wait are

Default (apiserver,system_pods)
False (no wait)
All (everything)
or a combination of available options "apiserver,system_pods,default_sa,apps_running,node_ready,kubelet"

we can introduce a new option (strict_check) therofore --wait=all will be "apiserver,system_pods,default_sa,apps_running,node_ready,kubelet, strict_check "

note that I dont acutally purpose the name "strict_check" any other name you come up with would be good.
this will allow the user to choose the level of wait they wanna control (so if they rather a fast start but dont care much about strict check they could pass the paramters they want)

but ---wait=all should always give u ALL possible wait to verify everythying

azhao155 · 2021-02-09T21:42:10Z

@prezha Just curious how you find out this issue, do you get the clue from any log? I also debugged this before but got no luck. If you could kindly share the debugging experience, that would be a very good learning lesson for all!

prezha · 2021-02-09T22:32:02Z

thank you @medyagh, i've added time metrics to the pr description and commented

prezha · 2021-02-09T22:51:57Z

@prezha Just curious how you find out this issue, do you get the clue from any log? I also debugged this before but got no luck. If you could kindly share the debugging experience, that would be a very good learning lesson for all!

@azhao155

in general: looked at plenty of logs :) - mostly those from failed ci/cd tasks here, but also run locally and then examined run logs and also the containers/services themself, tried to figure out the 'normal'/expected behaviour, then backtraced through the issue and found where it could root...

here is an example of a failed job: https://github.com/kubernetes/minikube/pull/10378/checks?check_run_id=1842596262

here is the command i've run locally:

env TEST_ARGS="-minikube-start-args=--driver=docker --alsologtostderr -v -test.run TestFunctional -test.timeout=10m -test.v -timeout-multiplier=1.5 --cleanup=false" make integration

you can then ssh into individual containers to see beyond default last 25 lines of logs, or you can use something like:

out/minikube -p functional-20210209193741-102061 logs -n 10000 > ~/wip/functional-20210209193741-102061.log

(replace functional-20210209193741-102061 with the one your test above will generate)

here specifically the test failed on services that were not operational, so i've then backtracked where and when they've been started (validateStartWithProxy functional test in out case), that also has --wait=all flag set, so everything shoul be up when tested... then, understood what services wait flag should cover and where, and then saw that we are not actually making sure that those are fully available to satisfy the expectation of the test and wait flag...

i hope this was not too confusing and somewhat helpful :)

azhao155 · 2021-02-10T00:15:10Z

@prezha Just curious how you find out this issue, do you get the clue from any log? I also debugged this before but got no luck. If you could kindly share the debugging experience, that would be a very good learning lesson for all!

@azhao155

in general: looked at plenty of logs :) - mostly those from failed ci/cd tasks here, but also run locally and then examined run logs and also the containers/services themself, tried to figure out the 'normal'/expected behaviour, then backtraced through the issue and found where it could root...

here is an example of a failed job: https://github.com/kubernetes/minikube/pull/10378/checks?check_run_id=1842596262

here is the command i've run locally:
env TEST_ARGS="-minikube-start-args=--driver=docker --alsologtostderr -v -test.run TestFunctional -test.timeout=10m -test.v -timeout-multiplier=1.5 --cleanup=false" make integration
you can then ssh into individual containers to see beyond default last 25 lines of logs, or you can use something like:
out/minikube -p functional-20210209193741-102061 logs -n 10000 > ~/wip/functional-20210209193741-102061.log
(replace functional-20210209193741-102061 with the one your test above will generate)

here specifically the test failed on services that were not operational, so i've then backtracked where and when they've been started (validateStartWithProxy functional test in out case), that also has --wait=all flag set, so everything shoul be up when tested... then, understood what services wait flag should cover and where, and then saw that we are not actually making sure that those are fully available to satisfy the expectation of the test and wait flag...

i hope this was not too confusing and somewhat helpful :)

Yeah, Thanks @prezha for the detail explanation, it's really helpful and appreciate it!

sharifelgamal · 2021-02-10T23:39:05Z

/ok-to-test

minikube-pr-bot · 2021-02-11T01:28:00Z

kvm2 Driver
Times for minikube: 69.5s 68.5s 71.8s
Average time for minikube: 69.9s

Times for Minikube (PR 10424): 160.1s 135.9s 143.9s
Average time for Minikube (PR 10424): 146.6s

Averages Time Per Log

+--------------------------------+----------+---------------------+
|              LOG               | MINIKUBE | MINIKUBE (PR 10424) |
+--------------------------------+----------+---------------------+
| * minikube v1.17.1 on Debian   | 0.0s     | 0.0s                |
| 9.11 (kvm/amd64)               |          |                     |
| * Using the kvm2 driver based  | 0.0s     | 0.0s                |
| on user configuration          |          |                     |
| * Starting control plane node  | 0.0s     | 0.0s                |
| minikube in cluster minikube   |          |                     |
| * Creating kvm2 VM (CPUs=2,    | 42.6s    | 41.8s               |
| Memory=3700MB, Disk=20000MB)   |          |                     |
| ...                            |          |                     |
| * Preparing Kubernetes v1.20.2 | 2.4s     | 9.5s                |
| on Docker 20.10.2 ...          |          |                     |
|   - Generating certificates    | 4.9s     | 2.9s                |
| and keys ...                   |          |                     |
|   - Booting up control plane   | 16.5s    | 10.6s               |
| ...                            |          |                     |
|   - Configuring RBAC rules ... | 1.4s     | 1.2s                |
| * Verifying Kubernetes         | 1.6s     | 1.8s                |
| components...                  |          |                     |
| * Enabled addons:              | 0.7s     | 78.8s               |
| storage-provisioner,           |          |                     |
| default-storageclass           |          |                     |
| * Done! kubectl is now         | 0.0s     | 0.0s                |
| configured to use "minikube"   |          |                     |
| cluster and "default"          |          |                     |
| namespace by default           |          |                     |
+--------------------------------+----------+---------------------+

docker Driver
Times for minikube: 26.1s 26.0s 27.6s
Average time for minikube: 26.6s

Times for Minikube (PR 10424): 110.9s 95.2s 101.6s
Average time for Minikube (PR 10424): 102.6s

Averages Time Per Log

+--------------------------------+----------+---------------------+
|              LOG               | MINIKUBE | MINIKUBE (PR 10424) |
+--------------------------------+----------+---------------------+
| * minikube v1.17.1 on Debian   | 0.2s     | 0.2s                |
| 9.11 (kvm/amd64)               |          |                     |
| * Using the docker driver      | 0.1s     | 0.1s                |
| based on user configuration    |          |                     |
| * Starting control plane node  | 0.1s     | 0.1s                |
| minikube in cluster minikube   |          |                     |
| * Creating docker container    | 9.9s     | 9.8s                |
| (CPUs=2, Memory=3700MB) ...    |          |                     |
| * Preparing Kubernetes v1.20.2 | 15.1s    | 91.6s               |
| on Docker 20.10.2 ...          |          |                     |
| * Verifying Kubernetes         | 1.1s     | 0.8s                |
| components...                  |          |                     |
| * Enabled addons:              | 0.1s     | 0.1s                |
| storage-provisioner,           |          |                     |
| default-storageclass           |          |                     |
| * Done! kubectl is now         | 0.0s     | 0.0s                |
| configured to use "minikube"   |          |                     |
| cluster and "default"          |          |                     |
| namespace by default           |          |                     |
+--------------------------------+----------+---------------------+

…status

prezha · 2021-02-12T00:59:24Z

after consultation with @medya, i've updated pr so that these strict checks (ie, waiting for pods in kverify.CorePodsList to have status ready) are only done if --wait flag is set to all: we do not want to add additional delays in startup time in any other case

minikube-pr-bot · 2021-02-12T01:08:51Z

kvm2 Driver
Times for minikube: 67.8s 63.3s 65.6s
Average time for minikube: 65.6s

Times for Minikube (PR 10424): 156.6s 158.6s 155.0s
Average time for Minikube (PR 10424): 156.7s

Averages Time Per Log

+--------------------------------+----------+---------------------+
|              LOG               | MINIKUBE | MINIKUBE (PR 10424) |
+--------------------------------+----------+---------------------+
| * minikube v1.17.1 on Debian   | 0.0s     | 0.0s                |
| 9.11 (kvm/amd64)               |          |                     |
| * Using the kvm2 driver based  | 0.0s     | 0.0s                |
| on user configuration          |          |                     |
| * Starting control plane node  | 0.0s     | 0.0s                |
| minikube in cluster minikube   |          |                     |
| * Creating kvm2 VM (CPUs=2,    | 40.4s    | 42.0s               |
| Memory=3700MB, Disk=20000MB)   |          |                     |
| ...                            |          |                     |
| * Preparing Kubernetes v1.20.2 | 9.0s     | 2.3s                |
| on Docker 20.10.2 ...          |          |                     |
|   - Generating certificates    | 2.5s     | 4.7s                |
| and keys ...                   |          |                     |
|   - Booting up control plane   | 10.8s    | 15.6s               |
| ...                            |          |                     |
|   - Configuring RBAC rules ... | 1.0s     | 1.5s                |
| * Verifying Kubernetes         | 1.6s     | 1.6s                |
| components...                  |          |                     |
| * Enabled addons:              | 0.3s     | 88.9s               |
| storage-provisioner,           |          |                     |
| default-storageclass           |          |                     |
| * Done! kubectl is now         | 0.0s     | 0.0s                |
| configured to use "minikube"   |          |                     |
| cluster and "default"          |          |                     |
| namespace by default           |          |                     |
+--------------------------------+----------+---------------------+

docker Driver
Times for minikube: 27.6s 26.5s 26.4s
Average time for minikube: 26.8s

Times for Minikube (PR 10424): 114.2s 109.1s 96.7s
Average time for Minikube (PR 10424): 106.7s

Averages Time Per Log

+--------------------------------+----------+---------------------+
|              LOG               | MINIKUBE | MINIKUBE (PR 10424) |
+--------------------------------+----------+---------------------+
| * minikube v1.17.1 on Debian   | 0.2s     | 0.2s                |
| 9.11 (kvm/amd64)               |          |                     |
| * Using the docker driver      | 0.1s     | 0.1s                |
| based on user configuration    |          |                     |
| * Starting control plane node  | 0.1s     | 0.1s                |
| minikube in cluster minikube   |          |                     |
| * Creating docker container    | 10.0s    | 9.7s                |
| (CPUs=2, Memory=3700MB) ...    |          |                     |
| * Preparing Kubernetes v1.20.2 | 15.1s    | 95.7s               |
| on Docker 20.10.2 ...          |          |                     |
| * Verifying Kubernetes         | 1.3s     | 0.8s                |
| components...                  |          |                     |
| * Enabled addons:              | 0.1s     | 0.1s                |
| storage-provisioner,           |          |                     |
| default-storageclass           |          |                     |
| * Done! kubectl is now         | 0.0s     | 0.0s                |
| configured to use "minikube"   |          |                     |
| cluster and "default"          |          |                     |
| namespace by default           |          |                     |
+--------------------------------+----------+---------------------+

minikube-pr-bot · 2021-02-12T02:30:38Z

kvm2 Driver
Times for minikube: 72.9s 70.4s 68.2s
Average time for minikube: 70.5s

Times for Minikube (PR 10424): 71.2s 65.7s 68.3s
Average time for Minikube (PR 10424): 68.4s

Averages Time Per Log

+--------------------------------+----------+---------------------+
|              LOG               | MINIKUBE | MINIKUBE (PR 10424) |
+--------------------------------+----------+---------------------+
| * minikube v1.17.1 on Debian   | 0.0s     | 0.0s                |
| 9.11 (kvm/amd64)               |          |                     |
| * Using the kvm2 driver based  | 0.0s     | 0.0s                |
| on user configuration          |          |                     |
| * Starting control plane node  | 0.0s     | 0.0s                |
| minikube in cluster minikube   |          |                     |
| * Creating kvm2 VM (CPUs=2,    | 43.5s    | 41.8s               |
| Memory=3700MB, Disk=20000MB)   |          |                     |
| ...                            |          |                     |
| * Preparing Kubernetes v1.20.2 | 9.8s     | 16.5s               |
| on Docker 20.10.2 ...          |          |                     |
|   - Generating certificates    | 2.9s     | 1.5s                |
| and keys ...                   |          |                     |
|   - Booting up control plane   | 10.8s    | 5.2s                |
| ...                            |          |                     |
|   - Configuring RBAC rules ... | 1.1s     | 0.8s                |
| * Verifying Kubernetes         | 1.7s     | 1.7s                |
| components...                  |          |                     |
| * Enabled addons:              | 0.6s     | 0.8s                |
| storage-provisioner,           |          |                     |
| default-storageclass           |          |                     |
| * Done! kubectl is now         | 0.0s     | 0.0s                |
| configured to use "minikube"   |          |                     |
| cluster and "default"          |          |                     |
| namespace by default           |          |                     |
+--------------------------------+----------+---------------------+

docker Driver
Times for minikube: 25.9s 26.3s 26.3s
Average time for minikube: 26.1s

Times for Minikube (PR 10424): 26.5s 24.3s 24.9s
Average time for Minikube (PR 10424): 25.2s

Averages Time Per Log

+--------------------------------+----------+---------------------+
|              LOG               | MINIKUBE | MINIKUBE (PR 10424) |
+--------------------------------+----------+---------------------+
| * minikube v1.17.1 on Debian   | 0.2s     | 0.2s                |
| 9.11 (kvm/amd64)               |          |                     |
| * Using the docker driver      | 0.1s     | 0.1s                |
| based on user configuration    |          |                     |
| * Starting control plane node  | 0.1s     | 0.1s                |
| minikube in cluster minikube   |          |                     |
| * Creating docker container    | 9.6s     | 9.6s                |
| (CPUs=2, Memory=3700MB) ...    |          |                     |
| * Preparing Kubernetes v1.20.2 | 15.0s    | 14.2s               |
| on Docker 20.10.2 ...          |          |                     |
| * Verifying Kubernetes         | 1.1s     | 1.0s                |
| components...                  |          |                     |
| * Enabled addons:              | 0.1s     | 0.1s                |
| storage-provisioner,           |          |                     |
| default-storageclass           |          |                     |
| * Done! kubectl is now         | 0.0s     | 0.0s                |
| configured to use "minikube"   |          |                     |
| cluster and "default"          |          |                     |
| namespace by default           |          |                     |
+--------------------------------+----------+---------------------+

medyagh

@prezha

it is possible that we do not Respect the --wait flag if it is passed to an Existiing cluster (on a second stat)

maybe the code needs to be fixed in updateExistingConfigFromFlags

// updateExistingConfigFromFlags will update the existing config from the flags - used on a second start
// skipping updating existing docker env , docker opt, InsecureRegistry, registryMirror, extra-config, apiserver-ips
func updateExistingConfigFromFlags(cmd *cobra.Command, existing *config.ClusterConfig) config.ClusterConfig { //nolint to suppress cyclomatic complexity 45 of func `updateExistingConfigFromFlags` is high (> 30)

	validateFlags(cmd, existing.Driver)

	cc := *existing

medyagh

test is still failing

I think what is missing is need to add the new wait component to other places such as the bootstrapper

func (k *Bootstrapper) WaitForNode(
... add new component here

and also may be we need to ensure the other Route of waiting (if the cluster already exists, respects the --wait flags)

https://github.com/medyagh/minikube/blob/a44e009c6d54971d208f2e2789fc8473b0bec789/pkg/minikube/bootstrapper/kubeadm/kubeadm.go#L968

or other places in the code that we do waiting after the cluster is already running

prezha · 2021-02-12T21:50:19Z

@prezha

it is possible that we do not Respect the --wait flag if it is passed to an Existiing cluster (on a second stat)

maybe the code needs to be fixed in updateExistingConfigFromFlags

// updateExistingConfigFromFlags will update the existing config from the flags - used on a second start
// skipping updating existing docker env , docker opt, InsecureRegistry, registryMirror, extra-config, apiserver-ips
func updateExistingConfigFromFlags(cmd *cobra.Command, existing *config.ClusterConfig) config.ClusterConfig { //nolint to suppress cyclomatic complexity 45 of func `updateExistingConfigFromFlags` is high (> 30)

	validateFlags(cmd, existing.Driver)

	cc := *existing

@medya do you mean, on subsequent starts, to completely remove --wait flag whichever is set (ie, not just if it's all)?
essentially, in updateExistingConfigFromFlags replacing the segment:

minikube/cmd/minikube/cmd/start_flags.go

Lines 655 to 657 in 27d86a4

    
           if cmd.Flags().Changed(waitComponents) { 
        
           	cc.VerifyComponents = interpretWaitFlag(*cmd) 
        
           }

with
cc.VerifyComponents = kverify.NoComponents

prezha · 2021-02-12T21:57:55Z

test is still failing

I think what is missing is need to add the new wait component to other places such as the bootstrapper

func (k *Bootstrapper) WaitForNode(
... add new component here

thanks @medyagh, i think i've already added that in the pr:
https://github.com/kubernetes/minikube/pull/10424/files#diff-594d52daef2d962248b10339145fa1e601c0da05d12412e86e9d2f569b4d79c6

prezha · 2021-02-15T01:35:37Z

@prezha
it is possible that we do not Respect the --wait flag if it is passed to an Existiing cluster (on a second stat)
maybe the code needs to be fixed in updateExistingConfigFromFlags
// updateExistingConfigFromFlags will update the existing config from the flags - used on a second start
// skipping updating existing docker env , docker opt, InsecureRegistry, registryMirror, extra-config, apiserver-ips
func updateExistingConfigFromFlags(cmd *cobra.Command, existing *config.ClusterConfig) config.ClusterConfig { //nolint to suppress cyclomatic complexity 45 of func `updateExistingConfigFromFlags` is high (> 30)

	validateFlags(cmd, existing.Driver)

	cc := *existing
@medya do you mean, on subsequent starts, to completely remove --wait flag whichever is set (ie, not just if it's all)?
essentially, in updateExistingConfigFromFlags replacing the segment:

minikube/cmd/minikube/cmd/start_flags.go

Lines 655 to 657 in 27d86a4

if cmd.Flags().Changed(waitComponents) {

cc.VerifyComponents = interpretWaitFlag(*cmd)

}

with
cc.VerifyComponents = kverify.NoComponents

i think a good place to intervene is:

minikube/cmd/minikube/cmd/start.go

Line 178 in 6c2280c

if existing != nil {

here, on subsequent starts, we can simply turn off whichever checks we want...

what about restarts of components during the first run - should we still respect --wait=all if set (easier) or not (which then complicates things a bit)?

medyagh · 2021-02-15T06:34:08Z

of components during the first run - should we still respect --wait=all if set (easier) or not (which then

@prezha
we should respect --wait=all anytime it is set, and make sure to wait for maximum amount of things we can wait on.

but if not flag is set, and user only does "minikube start" we should not add any new things to wait on.

minikube-pr-bot · 2021-02-16T16:10:36Z

kvm2 Driver
Times for minikube: 70.0s 71.1s 69.1s
Average time for minikube: 70.1s

Times for Minikube (PR 10424): 66.4s 68.6s 66.5s
Average time for Minikube (PR 10424): 67.1s

Averages Time Per Log

+--------------------------------+----------+---------------------+
|              LOG               | MINIKUBE | MINIKUBE (PR 10424) |
+--------------------------------+----------+---------------------+
| * minikube v1.17.1 on Debian   | 0.0s     | 0.0s                |
| 9.11 (kvm/amd64)               |          |                     |
| * Using the kvm2 driver based  | 0.0s     | 0.0s                |
| on user configuration          |          |                     |
| * Starting control plane node  | 0.0s     | 0.0s                |
| minikube in cluster minikube   |          |                     |
| * Creating kvm2 VM (CPUs=2,    | 42.5s    | 40.6s               |
| Memory=3700MB, Disk=20000MB)   |          |                     |
| ...                            |          |                     |
| * Preparing Kubernetes v1.20.2 | 9.6s     | 16.5s               |
| on Docker 20.10.2 ...          |          |                     |
|   - Generating certificates    | 3.0s     | 1.3s                |
| and keys ...                   |          |                     |
|   - Booting up control plane   | 11.1s    | 5.4s                |
| ...                            |          |                     |
|   - Configuring RBAC rules ... | 1.2s     | 1.0s                |
| * Verifying Kubernetes         | 1.7s     | 1.7s                |
| components...                  |          |                     |
| * Enabled addons:              | 0.9s     | 0.8s                |
| default-storageclass,          |          |                     |
| storage-provisioner            |          |                     |
| * Done! kubectl is now         | 0.0s     | 0.0s                |
| configured to use "minikube"   |          |                     |
| cluster and "default"          |          |                     |
| namespace by default           |          |                     |
+--------------------------------+----------+---------------------+

docker Driver
Times for minikube: 29.5s 25.5s 26.7s
Average time for minikube: 27.3s

Times for Minikube (PR 10424): 27.4s 26.7s 26.1s
Average time for Minikube (PR 10424): 26.7s

Averages Time Per Log

+--------------------------------+----------+---------------------+
|              LOG               | MINIKUBE | MINIKUBE (PR 10424) |
+--------------------------------+----------+---------------------+
| * minikube v1.17.1 on Debian   | 0.2s     | 0.2s                |
| 9.11 (kvm/amd64)               |          |                     |
| * Using the docker driver      | 0.1s     | 0.1s                |
| based on user configuration    |          |                     |
| * Starting control plane node  | 0.1s     | 0.1s                |
| minikube in cluster minikube   |          |                     |
| * Creating docker container    | 11.1s    | 10.1s               |
| (CPUs=2, Memory=3700MB) ...    |          |                     |
| * Preparing Kubernetes v1.20.2 | 14.7s    | 14.9s               |
| on Docker 20.10.2 ...          |          |                     |
| * Verifying Kubernetes         | 1.0s     | 1.2s                |
| components...                  |          |                     |
| * Enabled addons:              | 0.1s     | 0.1s                |
| storage-provisioner,           |          |                     |
| default-storageclass           |          |                     |
| * Done! kubectl is now         | 0.0s     | 0.0s                |
| configured to use "minikube"   |          |                     |
| cluster and "default"          |          |                     |
| namespace by default           |          |                     |
+--------------------------------+----------+---------------------+

medyagh · 2021-02-16T17:38:29Z

pkg/minikube/bootstrapper/kubeadm/kubeadm.go

+		// it appears to be immediately Ready as are all kube-system pods
+		// then (after ~10sec) it realises it has some changes to apply, implying also pods restarts
+		// so we wait for kubelet to initialise itself...
+		time.Sleep(10 * time.Second)


I dont think this is the right thing to sleep 10 seconds randomly, is there any other way to detect that kubelet is trying to initialize itself ?

you are right @medyagh and thank you for proposing to use retry.Expo() instead

medyagh · 2021-02-16T17:39:01Z

pkg/minikube/bootstrapper/bsutil/kverify/kverify.go

@@ -37,16 +37,18 @@ const (
 	NodeReadyKey = "node_ready"
 	// KubeletKey is the name used in the flags for waiting for the kubelet status to be ready
 	KubeletKey = "kubelet"
+	// OperationalKey is the name used for waiting for pods in CorePodsList to be Ready
+	OperationalKey = "operational"


how about we name this, something like
"extra"

sounds good - renamed

minikube-pr-bot · 2021-02-16T22:40:35Z

kvm2 Driver
Times for minikube: 72.0s 70.2s 68.8s
Average time for minikube: 70.3s

Times for Minikube (PR 10424): 64.6s 68.6s 68.7s
Average time for Minikube (PR 10424): 67.3s

Averages Time Per Log

+--------------------------------+----------+---------------------+
|              LOG               | MINIKUBE | MINIKUBE (PR 10424) |
+--------------------------------+----------+---------------------+
| * minikube v1.17.1 on Debian   | 0.0s     | 0.0s                |
| 9.11 (kvm/amd64)               |          |                     |
| * Using the kvm2 driver based  | 0.0s     | 0.0s                |
| on user configuration          |          |                     |
| * Starting control plane node  | 0.0s     | 0.0s                |
| minikube in cluster minikube   |          |                     |
| * Creating kvm2 VM (CPUs=2,    | 43.3s    | 42.3s               |
| Memory=3700MB, Disk=20000MB)   |          |                     |
| ...                            |          |                     |
| * Preparing Kubernetes v1.20.2 | 2.1s     | 8.6s                |
| on Docker 20.10.2 ...          |          |                     |
|   - Generating certificates    | 4.6s     | 3.1s                |
| and keys ...                   |          |                     |
|   - Booting up control plane   | 16.3s    | 10.1s               |
| ...                            |          |                     |
|   - Configuring RBAC rules ... | 1.5s     | 1.2s                |
| * Verifying Kubernetes         | 1.6s     | 1.5s                |
| components...                  |          |                     |
| * Enabled addons:              | 0.6s     | 0.2s                |
| storage-provisioner,           |          |                     |
| default-storageclass           |          |                     |
| * Done! kubectl is now         | 0.0s     | 0.0s                |
| configured to use "minikube"   |          |                     |
| cluster and "default"          |          |                     |
| namespace by default           |          |                     |
+--------------------------------+----------+---------------------+

docker Driver
Times for minikube: 26.9s 26.0s 27.1s
Average time for minikube: 26.6s

Times for Minikube (PR 10424): 26.2s 26.1s 28.2s
Average time for Minikube (PR 10424): 26.8s

Averages Time Per Log

+--------------------------------+----------+---------------------+
|              LOG               | MINIKUBE | MINIKUBE (PR 10424) |
+--------------------------------+----------+---------------------+
| * minikube v1.17.1 on Debian   | 0.2s     | 0.2s                |
| 9.11 (kvm/amd64)               |          |                     |
| * Using the docker driver      | 0.1s     | 0.1s                |
| based on user configuration    |          |                     |
| * Starting control plane node  | 0.1s     | 0.1s                |
| minikube in cluster minikube   |          |                     |
| * Creating docker container    | 9.8s     | 9.9s                |
| (CPUs=2, Memory=3700MB) ...    |          |                     |
| * Preparing Kubernetes v1.20.2 | 15.4s    | 15.5s               |
| on Docker 20.10.2 ...          |          |                     |
| * Verifying Kubernetes         | 0.9s     | 1.0s                |
| components...                  |          |                     |
| * Enabled addons:              | 0.1s     | 0.1s                |
| storage-provisioner,           |          |                     |
| default-storageclass           |          |                     |
| * Done! kubectl is now         | 0.0s     | 0.0s                |
| configured to use "minikube"   |          |                     |
| cluster and "default"          |          |                     |
| namespace by default           |          |                     |
+--------------------------------+----------+---------------------+

medyagh · 2021-02-16T22:45:42Z

pkg/minikube/bootstrapper/bsutil/kverify/pod_ready.go

+// WaitForPodReadyByLabel waits for pod with label ([key:]val) in a namespace to be in Ready condition.
+// If namespace is not provided, it defaults to "kube-system".
+// If label key is not provided, it will try with "component" and "k8s-app".
+func WaitForPodReadyByLabel(cs *kubernetes.Clientset, label, namespace string, timeout time.Duration) error {


this func is not used outside, make private.

minikube-pr-bot · 2021-02-16T23:10:35Z

kvm2 Driver
Times for minikube: 67.8s 68.5s 70.2s
Average time for minikube: 68.8s

Times for Minikube (PR 10424): 69.3s 67.6s 70.5s
Average time for Minikube (PR 10424): 69.2s

Averages Time Per Log

+--------------------------------+----------+---------------------+
|              LOG               | MINIKUBE | MINIKUBE (PR 10424) |
+--------------------------------+----------+---------------------+
| * minikube v1.17.1 on Debian   | 0.1s     | 0.0s                |
| 9.11 (kvm/amd64)               |          |                     |
| * Using the kvm2 driver based  | 0.0s     | 0.0s                |
| on user configuration          |          |                     |
| * Starting control plane node  | 0.0s     | 0.0s                |
| minikube in cluster minikube   |          |                     |
| * Creating kvm2 VM (CPUs=2,    | 42.1s    | 42.3s               |
| Memory=3700MB, Disk=20000MB)   |          |                     |
| ...                            |          |                     |
| * Preparing Kubernetes v1.20.2 | 9.0s     | 2.3s                |
| on Docker 20.10.2 ...          |          |                     |
|   - Generating certificates    | 3.0s     | 4.3s                |
| and keys ...                   |          |                     |
|   - Booting up control plane   | 10.8s    | 15.9s               |
| ...                            |          |                     |
|   - Configuring RBAC rules ... | 1.2s     | 1.6s                |
| * Verifying Kubernetes         | 1.7s     | 1.8s                |
| components...                  |          |                     |
| * Enabled addons:              | 0.6s     | 0.5s                |
| default-storageclass,          |          |                     |
| storage-provisioner            |          |                     |
| * Done! kubectl is now         | 0.0s     | 0.0s                |
| configured to use "minikube"   |          |                     |
| cluster and "default"          |          |                     |
| namespace by default           |          |                     |
+--------------------------------+----------+---------------------+

docker Driver
Times for minikube: 26.1s 26.8s 26.5s
Average time for minikube: 26.5s

Times for Minikube (PR 10424): 27.9s 26.5s 26.4s
Average time for Minikube (PR 10424): 27.0s

Averages Time Per Log

+--------------------------------+----------+---------------------+
|              LOG               | MINIKUBE | MINIKUBE (PR 10424) |
+--------------------------------+----------+---------------------+
| * minikube v1.17.1 on Debian   | 0.2s     | 0.2s                |
| 9.11 (kvm/amd64)               |          |                     |
| * Using the docker driver      | 0.1s     | 0.1s                |
| based on user configuration    |          |                     |
| * Starting control plane node  | 0.1s     | 0.1s                |
| minikube in cluster minikube   |          |                     |
| * Creating docker container    | 9.8s     | 9.9s                |
| (CPUs=2, Memory=3700MB) ...    |          |                     |
| * Preparing Kubernetes v1.20.2 | 15.2s    | 15.3s               |
| on Docker 20.10.2 ...          |          |                     |
| * Verifying Kubernetes         | 1.0s     | 1.3s                |
| components...                  |          |                     |
| * Enabled addons:              | 0.1s     | 0.1s                |
| storage-provisioner,           |          |                     |
| default-storageclass           |          |                     |
| * Done! kubectl is now         | 0.0s     | 0.0s                |
| configured to use "minikube"   |          |                     |
| cluster and "default"          |          |                     |
| namespace by default           |          |                     |
+--------------------------------+----------+---------------------+

medyagh

thank you very much for fixing this annoying test flake @prezha I really appreciate your patience to get this fixed the right way

k8s-ci-robot · 2021-02-16T23:50:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: medyagh, prezha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [medyagh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

prezha · 2021-02-16T23:52:57Z

thank you very much for fixing this annoying test flake @prezha I really appreciate your patience to get this fixed the right way

you are most welcome @medyagh and thank you for your reviews and comments!

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 9, 2021

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 9, 2021

k8s-ci-robot requested review from josedonizetti and priyawadhwa February 9, 2021 20:24

medyagh reviewed Feb 9, 2021

View reviewed changes

prezha requested a review from medyagh February 9, 2021 23:23

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 10, 2021

kubernetes deleted a comment from minikube-pr-bot Feb 10, 2021

kubernetes deleted a comment from minikube-pr-bot Feb 11, 2021

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 12, 2021

prezha added 2 commits February 12, 2021 00:46

fix WaitForPod by waiting for component Ready instead of pod Running …

3c7d2e0

…status

wait CorePodsList components to be Ready only if --wait=all

93e98de

prezha force-pushed the fix-WaitForPod-Ready branch from 947cd64 to 93e98de Compare February 12, 2021 00:49

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 12, 2021

fix integration.validateExtraConfig test adding --wait=all

ec97f1d

medyagh reviewed Feb 12, 2021

View reviewed changes

medyagh requested changes Feb 12, 2021

View reviewed changes

wait kubelet stabilise after restart

90cd9c3

medyagh requested changes Feb 16, 2021

View reviewed changes

priyawadhwa mentioned this pull request Feb 16, 2021

Reduce time limit on TestStartStop to fix occasional KVM_containerd timeout #10488

Merged

replace sleep with retry.Expo()

b8052fe

prezha requested a review from medyagh February 16, 2021 22:29

medyagh requested changes Feb 16, 2021

View reviewed changes

cleanup: remove unused code, make internal-only func private

8099f75

medyagh changed the title ~~fix WaitForPod by waiting for conditions Ready instead of Running phase~~ add new extra component to --wait=all to validate a healthy cluster Feb 16, 2021

medyagh approved these changes Feb 16, 2021

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 16, 2021

medyagh approved these changes Feb 16, 2021

View reviewed changes

medyagh merged commit d84f178 into kubernetes:master Feb 16, 2021

prezha mentioned this pull request Feb 19, 2021

REQUEST: New membership for prezha kubernetes/org#2514

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add new extra component to --wait=all to validate a healthy cluster #10424

add new extra component to --wait=all to validate a healthy cluster #10424

prezha commented Feb 9, 2021 •

edited

Loading

k8s-ci-robot commented Feb 9, 2021

minikube-bot commented Feb 9, 2021

medyagh left a comment

azhao155 commented Feb 9, 2021

prezha commented Feb 9, 2021

prezha commented Feb 9, 2021

azhao155 commented Feb 10, 2021

sharifelgamal commented Feb 10, 2021

minikube-pr-bot commented Feb 11, 2021

prezha commented Feb 12, 2021

minikube-pr-bot commented Feb 12, 2021

minikube-pr-bot commented Feb 12, 2021

medyagh left a comment

medyagh left a comment •

edited

Loading

prezha commented Feb 12, 2021 •

edited

Loading

prezha commented Feb 12, 2021

prezha commented Feb 15, 2021 •

edited

Loading

medyagh commented Feb 15, 2021

minikube-pr-bot commented Feb 16, 2021

medyagh Feb 16, 2021

prezha Feb 16, 2021

medyagh Feb 16, 2021

prezha Feb 16, 2021

minikube-pr-bot commented Feb 16, 2021

medyagh Feb 16, 2021

prezha Feb 16, 2021

minikube-pr-bot commented Feb 16, 2021

medyagh left a comment

k8s-ci-robot commented Feb 16, 2021

prezha commented Feb 16, 2021

add new extra component to --wait=all to validate a healthy cluster #10424

add new extra component to --wait=all to validate a healthy cluster #10424

Conversation

prezha commented Feb 9, 2021 • edited Loading

time metrics (all completed successfully - here are just resulting values for brevity)

before

after

k8s-ci-robot commented Feb 9, 2021

minikube-bot commented Feb 9, 2021

medyagh left a comment

Choose a reason for hiding this comment

azhao155 commented Feb 9, 2021

prezha commented Feb 9, 2021

prezha commented Feb 9, 2021

azhao155 commented Feb 10, 2021

sharifelgamal commented Feb 10, 2021

minikube-pr-bot commented Feb 11, 2021

prezha commented Feb 12, 2021

minikube-pr-bot commented Feb 12, 2021

minikube-pr-bot commented Feb 12, 2021

medyagh left a comment

Choose a reason for hiding this comment

medyagh left a comment • edited Loading

Choose a reason for hiding this comment

prezha commented Feb 12, 2021 • edited Loading

prezha commented Feb 12, 2021

prezha commented Feb 15, 2021 • edited Loading

medyagh commented Feb 15, 2021

minikube-pr-bot commented Feb 16, 2021

medyagh Feb 16, 2021

Choose a reason for hiding this comment

prezha Feb 16, 2021

Choose a reason for hiding this comment

medyagh Feb 16, 2021

Choose a reason for hiding this comment

prezha Feb 16, 2021

Choose a reason for hiding this comment

minikube-pr-bot commented Feb 16, 2021

medyagh Feb 16, 2021

Choose a reason for hiding this comment

prezha Feb 16, 2021

Choose a reason for hiding this comment

minikube-pr-bot commented Feb 16, 2021

medyagh left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Feb 16, 2021

prezha commented Feb 16, 2021

prezha commented Feb 9, 2021 •

edited

Loading

medyagh left a comment •

edited

Loading

prezha commented Feb 12, 2021 •

edited

Loading

prezha commented Feb 15, 2021 •

edited

Loading