Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kicbase builds fail TestStartStop/group/crio/serial/UserAppExistsAfterStop #8890

Closed
tstromberg opened this issue Jul 30, 2020 · 3 comments
Closed
Assignees
Labels
july-chill priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@tstromberg
Copy link
Contributor

tstromberg commented Jul 30, 2020

@medyagh mentioned that he observed failures with TestStartStop/group/crio on the latest kicbase.

My first attempt to duplicate them corroborates this:

  • docker build -t kicbase:experiment deploy/kicbase
  • env TEST_ARGS="-minikube-start-args='--vm-driver=docker --base-image=kicbase:experiment' -test.run=TestStartStop/group/crio --cleanup=false" make integration

Before I hit Ctrl-C, the test result showed:

=== RUN   TestStartStop/group/crio/serial/UserAppExistsAfterStop
    TestStartStop/group/crio/serial/UserAppExistsAfterStop: start_stop_delete_test.go:208: (dbg) TestStartStop/group/crio/serial/UserAppExistsAfterStop: waiting 9m0s for pods matching "k8s-app=kubernetes-dashboard" in namespace "kubernetes-dashboard" ...
    TestStartStop/group/crio/serial/UserAppExistsAfterStop: helpers_test.go:317: TestStartStop/group/crio/serial/UserAppExistsAfterStop: WARNING: pod list for "kubernetes-dashboard" "k8s-app=kubernetes-dashboard" returned: Get "https://127.0.0.1:32792/api/v1/namespaces/kubernetes-dashboard/pods?labelSelector=k8s-app%3Dkubernetes-dashboard": dial tcp 127.0.0.1:32792: connect: connection refused
    TestStartStop/group/crio/serial/UserAppExistsAfterStop: helpers_test.go:317: TestStartStop/group/crio/serial/UserAppExistsAfterStop: WARNING: pod list for "kubernetes-dashboard" "k8s-app=kubernetes-dashboard" returned: Get "https://127.0.0.1:32792/api/v1/namespaces/kubernetes-dashboard/pods?labelSelector=k8s-app%3Dkubernetes-dashboard": dial tcp 127.0.0.1:32792: connect: connection refused
    TestStartStop/group/crio/serial/UserAppExistsAfterStop: helpers_test.go:317: TestStartStop/group/crio/serial/UserAppExistsAfterStop: WARNING: pod list for "kubernetes-dashboard" "k8s-app=kubernetes-dashboard" returned: Get "https://127.0.0.1:32792/api/v1/namespaces/kubernetes-dashboard/pods?labelSelector=k8s-app%3Dkubernetes-dashboard": dial tcp 127.0.0.1:32792: connect: connection refused
    TestStartStop/group/crio/serial/UserAppExistsAfterStop: helpers_test.go:317: TestStartStop/group/crio/serial/UserAppExistsAfterStop: WARNING: pod list for "kubernetes-dashboard" "k8s-app=kubernetes-dashboard" returned: Get "https://127.0.0.1:32792/api/v1/namespaces/kubernetes-dashboard/pods?labelSelector=k8s-app%3Dkubernetes-dashboard": dial tcp 127.0.0.1:32792: connect: connection refused
....

minikube logs showed that zero pods were running after the restart:

==> CRI-O <==
-- Logs begin at Thu 2020-07-30 15:22:55 UTC, end at Thu 2020-07-30 15:36:43 UTC. --
Jul 30 15:23:01 crio-20200730081648-5550 systemd[1]: Starting Container Runtime Interface for OCI (CRI-O)...
Jul 30 15:23:01 crio-20200730081648-5550 systemd[1]: Started Container Runtime Interface for OCI (CRI-O).

==> container status <==
CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID

==> describe nodes <==
E0730 08:36:43.923734    7320 logs.go:178] command /bin/bash -c "sudo /var/lib/minikube/binaries/v1.15.7/kubectl describe nodes --kubeconfig=/var/lib/minikube/kubeconfig" failed with error: /bin/bash -c "sudo /var/lib/minikube/binaries/v1.15.7/kubectl describe nodes --kubeconfig=/var/lib/minikube/kubeconfig": Process exited with status 1
stdout:

stderr:
The connection to the server localhost:8443 was refused - did you specify the right host or port?
 output: "\n** stderr ** \nThe connection to the server localhost:8443 was refused - did you specify the right host or port?\n\n** /stderr **"

==> dmesg <==

==> kernel <==
 15:36:44 up 29 min,  0 users,  load average: 4.57, 4.81, 5.82
Linux crio-20200730081648-5550 4.19.76-linuxkit #1 SMP Tue May 26 11:42:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 19.10"

==> kubelet <==
-- Logs begin at Thu 2020-07-30 15:22:55 UTC, end at Thu 2020-07-30 15:36:44 UTC. --
Jul 30 15:36:42 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:42.982681   55612 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]"
Jul 30 15:36:42 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:42.982940   55612 kubelet.go:282] Adding pod path: /etc/kubernetes/manifests
Jul 30 15:36:42 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:42.983040   55612 kubelet.go:307] Watching apiserver
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: E0730 15:36:43.011311   55612 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://control-plane.minikube.internal:8443/api/v1/pods?fieldSelector=spec.nodeName%3Dcrio-20200730081648-5550&limit=500&resourceVersion=0: dial tcp 172.17.0.2:8443: connect: connection refused
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: E0730 15:36:43.011428   55612 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:445: Failed to list *v1.Service: Get https://control-plane.minikube.internal:8443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.17.0.2:8443: connect: connection refused
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: W0730 15:36:43.011814   55612 util_unix.go:103] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.032621   55612 remote_runtime.go:59] parsed scheme: ""
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.033306   55612 remote_runtime.go:59] scheme "" not registered, fallback to default scheme
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: W0730 15:36:43.033540   55612 util_unix.go:103] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.035735   55612 remote_image.go:50] parsed scheme: ""
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.035824   55612 remote_image.go:50] scheme "" not registered, fallback to default scheme
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.036302   55612 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{/var/run/crio/crio.sock 0  <nil>}]
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.036409   55612 clientconn.go:796] ClientConn switching balancer to "pick_first"
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: E0730 15:36:43.045258   55612 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:454: Failed to list *v1.Node: Get https://control-plane.minikube.internal:8443/api/v1/nodes?fieldSelector=metadata.name%3Dcrio-20200730081648-5550&limit=500&resourceVersion=0: dial tcp 172.17.0.2:8443: connect: connection refused
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.045321   55612 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{/var/run/crio/crio.sock 0  <nil>}]
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.045335   55612 clientconn.go:796] ClientConn switching balancer to "pick_first"
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.045366   55612 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc0002a3130, CONNECTING
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.048415   55612 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc0002a3130, READY
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.050421   55612 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc000b1bd00, CONNECTING
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.050663   55612 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc000b1bd00, READY
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: E0730 15:36:43.381316   55612 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]:         For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.385170   55612 kuberuntime_manager.go:205] Container runtime cri-o initialized, version: 1.17.4, apiVersion: v1alpha1
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.386455   55612 server.go:1081] Started kubelet
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.387421   55612 server.go:144] Starting to listen on 0.0.0.0:10250
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: E0730 15:36:43.389816   55612 event.go:249] Unable to write event: 'Post https://control-plane.minikube.internal:8443/api/v1/namespaces/default/events: dial tcp 172.17.0.2:8443: connect: connection refused' (may retry after sleeping)
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.395128   55612 server.go:350] Adding debug handlers to kubelet server.
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.410431   55612 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.416052   55612 status_manager.go:152] Starting to sync pod status with apiserver
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.416101   55612 kubelet.go:1809] Starting kubelet main sync loop.
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.416114   55612 kubelet.go:1826] skipping pod synchronization - [container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.416608   55612 volume_manager.go:243] Starting Kubelet Volume Manager
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: E0730 15:36:43.423408   55612 kubelet.go:2173] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.423741   55612 clientconn.go:440] parsed scheme: "unix"
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.424104   55612 clientconn.go:440] scheme "unix" not registered, fallback to default scheme
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.424347   55612 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0  <nil>}]
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.424516   55612 clientconn.go:796] ClientConn switching balancer to "pick_first"
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.424688   55612 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc000720ea0, CONNECTING
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.425145   55612 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc000720ea0, READY
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: E0730 15:36:43.427496   55612 controller.go:125] failed to ensure node lease exists, will retry in 200ms, error: Get https://control-plane.minikube.internal:8443/apis/coordination.k8s.io/v1beta1/namespaces/kube-node-lease/leases/crio-20200730081648-5550?timeout=10s: dial tcp 172.17.0.2:8443: connect: connection refused
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: E0730 15:36:43.429442   55612 reflector.go:125] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSIDriver: Get https://control-plane.minikube.internal:8443/apis/storage.k8s.io/v1beta1/csidrivers?limit=500&resourceVersion=0: dial tcp 172.17.0.2:8443: connect: connection refused
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.435232   55612 desired_state_of_world_populator.go:130] Desired state populator starts to run
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: E0730 15:36:43.435441   55612 reflector.go:125] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.RuntimeClass: Get https://control-plane.minikube.internal:8443/apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&resourceVersion=0: dial tcp 172.17.0.2:8443: connect: connection refused
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.522586   55612 kubelet.go:1826] skipping pod synchronization - container runtime status check may not have completed yet
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: E0730 15:36:43.532532   55612 kubelet.go:2252] node "crio-20200730081648-5550" not found
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.534989   55612 kubelet_node_status.go:286] Setting node annotation to enable volume controller attach/detach
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: E0730 15:36:43.629799   55612 controller.go:125] failed to ensure node lease exists, will retry in 400ms, error: Get https://control-plane.minikube.internal:8443/apis/coordination.k8s.io/v1beta1/namespaces/kube-node-lease/leases/crio-20200730081648-5550?timeout=10s: dial tcp 172.17.0.2:8443: connect: connection refused
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.633086   55612 kubelet_node_status.go:72] Attempting to register node crio-20200730081648-5550
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: E0730 15:36:43.633903   55612 kubelet_node_status.go:94] Unable to register node "crio-20200730081648-5550" with API server: Post https://control-plane.minikube.internal:8443/api/v1/nodes: dial tcp 172.17.0.2:8443: connect: connection refused
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: E0730 15:36:43.650747   55612 kubelet.go:2252] node "crio-20200730081648-5550" not found
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.708276   55612 kubelet_node_status.go:286] Setting node annotation to enable volume controller attach/detach
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.723402   55612 kubelet.go:1826] skipping pod synchronization - container runtime status check may not have completed yet
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: E0730 15:36:43.750995   55612 kubelet.go:2252] node "crio-20200730081648-5550" not found
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.755287   55612 cpu_manager.go:155] [cpumanager] starting with none policy
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.755302   55612 cpu_manager.go:156] [cpumanager] reconciling every 10s
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.755316   55612 policy_none.go:42] [cpumanager] none policy: Start
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: I0730 15:36:43.839044   55612 kubelet_node_status.go:286] Setting node annotation to enable volume controller attach/detach
Jul 30 15:36:43 crio-20200730081648-5550 kubelet[55612]: F0730 15:36:43.848410   55612 kubelet.go:1372] Failed to start ContainerManager failed to initialize top level QOS containers: root container [kubepods] doesn't exist
Jul 30 15:36:43 crio-20200730081648-5550 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/EXCEPTION
Jul 30 15:36:43 crio-20200730081648-5550 systemd[1]: kubelet.service: Failed with result 'exit-code'.
@tstromberg tstromberg added july-chill priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jul 30, 2020
@tstromberg
Copy link
Contributor Author

tstromberg commented Jul 30, 2020

It's worth noting that this test works fine at master with the currently released kicbase:

--- PASS: TestStartStop (307.87s)
    --- PASS: TestStartStop/group (0.00s)
        --- PASS: TestStartStop/group/crio (307.87s)
            --- PASS: TestStartStop/group/crio/serial (307.87s)
                --- PASS: TestStartStop/group/crio/serial/FirstStart (176.20s)
                --- PASS: TestStartStop/group/crio/serial/DeployApp (13.09s)
                --- PASS: TestStartStop/group/crio/serial/Stop (20.70s)
                --- PASS: TestStartStop/group/crio/serial/EnableAddonAfterStop (0.19s)
                --- PASS: TestStartStop/group/crio/serial/SecondStart (72.72s)
                --- PASS: TestStartStop/group/crio/serial/UserAppExistsAfterStop (16.03s)
                --- PASS: TestStartStop/group/crio/serial/AddonExistsAfterStop (5.02s)
                --- PASS: TestStartStop/group/crio/serial/VerifyKubernetesImages (0.40s)
                --- PASS: TestStartStop/group/crio/serial/Pause (3.52s)
PASS
Tests completed in 5m7.869411469s (result code 0)
ok  	k8s.io/minikube/test/integration	308.764s
crio version 1.17.3
commit: "3473adcfe09a306a2ac15773fc7c3f06dc5ff9db-dirty"

@tstromberg tstromberg self-assigned this Jul 30, 2020
@tstromberg
Copy link
Contributor Author

The new image upgrades crio to v1.17.4, but consistently fails TestStartStop/group/crio/serial/UserAppExistsAfterStop. Here is an easy way to replicate the issue locally:

docker build -t kicbase:experiment deploy/kicbase

./out/minikube delete \
  && ./out/minikube start --vm-driver=docker --base-image=kicbase:experiment --container-runtime=crio \
  && kubectl apply -f ./test/integration/testdata/netcat-deployment.yaml \ 
  && sleep 60 \
  && ./out/minikube stop; and ./out/minikube start --vm-driver=docker --base-image=kicbase:experiment --container-runtime=crio

It will fail with:

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

minikube logs will show:

Jul 30 16:40:26 minikube kubelet[13192]: E0730 16:40:26.487040   13192 kubelet.go:2267] node "minikube" not found
Jul 30 16:40:26 minikube kubelet[13192]: F0730 16:40:26.522095   13192 kubelet.go:1383] Failed to start ContainerManager failed to initialize top level QOS containers: root container [kubepods] doesn't exist
Jul 30 16:40:26 minikube systemd[1]: kubelet.service: Main process exited, code=exited, status=255/EXCEPTION
Jul 30 16:40:26 minikube systemd[1]: kubelet.service: Failed with result 'exit-co

@tstromberg tstromberg changed the title (release blocker) verify TestStartStop/group/crio on latest kicbase (release blocker) Unable to build kicbase due to crio breakage Jul 30, 2020
@tstromberg tstromberg changed the title (release blocker) Unable to build kicbase due to crio breakage Latest kicbase builds fail TestStartStop/group/crio/serial/UserAppExistsAfterStop Jul 30, 2020
@tstromberg tstromberg changed the title Latest kicbase builds fail TestStartStop/group/crio/serial/UserAppExistsAfterStop kicbase builds fail TestStartStop/group/crio/serial/UserAppExistsAfterStop Jul 30, 2020
@tstromberg
Copy link
Contributor Author

Marking as dupe of #8861

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
july-chill priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

1 participant