You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Firecracker-containerd deployment option is not working.
This shows itself as calico pods in non-ready state (ready 0/1 pods) and failed liveness probe. Some other pods are also not ready (default domain has not succeeded, registry is in constant CrashLoopBackOff)
To Reproduce
Quickstart guide, firecracker deployment, cloudlab xl170 or d430 2-node cluster.
Commands:
# both nodes
git clone --depth=1 https://github.com/vhive-serverless/vhive.git
cd vhive
mkdir -p /tmp/vhive-logs
./scripts/cloudlab/setup_node.sh > >(tee -a /tmp/vhive-logs/setup_node.stdout) 2> >(tee -a /tmp/vhive-logs/setup_node.stderr >&2)
# for worker
./scripts/cluster/setup_worker_kubelet.sh > >(tee -a /tmp/vhive-logs/setup_worker_kubelet.stdout) 2> >(tee -a /tmp/vhive-logs/setup_worker_kubelet.stderr >&2)
sudo screen -dmS containerd bash -c "containerd > >(tee -a /tmp/vhive-logs/containerd.stdout) 2> >(tee -a /tmp/vhive-logs/containerd.stderr >&2)"
sudo PATH=$PATH screen -dmS firecracker bash -c "/usr/local/bin/firecracker-containerd --config /etc/firecracker-containerd/config.toml > >(tee -a /tmp/vhive-logs/firecracker.stdout) 2> >(tee -a /tmp/vhive-logs/firecracker.stderr >&2)"
source /etc/profile && go build
sudo screen -dmS vhive bash -c "./vhive > >(tee -a /tmp/vhive-logs/vhive.stdout) 2> >(tee -a /tmp/vhive-logs/vhive.stderr >&2)"
# for master
sudo screen -dmS containerd bash -c "containerd > >(tee -a /tmp/vhive-logs/containerd.stdout) 2> >(tee -a /tmp/vhive-logs/containerd.stderr >&2)"
./scripts/cluster/create_multinode_cluster.sh > >(tee -a /tmp/vhive-logs/create_multinode_cluster.stdout) 2> >(tee -a /tmp/vhive-logs/create_multinode_cluster.stderr >&2)
# Join cluster from worker, 'y' to master node
Expected behavior
Have working setup: all pods are ready.
kubectl describe pod calico-node-fd2vw -n kube-system | tail -40
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19m default-scheduler Successfully assigned kube-system/calico-node-fd2vw to node-1.vhive-test.ntu-cloud.emulab.net
Normal Pulling 19m kubelet Pulling image "docker.io/calico/cni:v3.25.1"
Normal Pulled 19m kubelet Successfully pulled image "docker.io/calico/cni:v3.25.1" in 6.609745176s
Normal Created 19m kubelet Created container upgrade-ipam
Normal Started 19m kubelet Started container upgrade-ipam
Normal Pulled 19m kubelet Container image "docker.io/calico/cni:v3.25.1" already present on machine
Normal Created 19m kubelet Created container install-cni
Normal Started 19m kubelet Started container install-cni
Normal Pulling 19m kubelet Pulling image "docker.io/calico/node:v3.25.1"
Normal Pulled 19m kubelet Successfully pulled image "docker.io/calico/node:v3.25.1" in 8.338422707s
Normal Created 19m kubelet Created container mount-bpffs
Normal Started 19m kubelet Started container mount-bpffs
Normal Pulled 19m kubelet Container image "docker.io/calico/node:v3.25.1" already present on machine
Normal Created 19m kubelet Created container calico-node
Normal Started 19m kubelet Started container calico-node
Warning Unhealthy 19m kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Warning Unhealthy 19m kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
Warning Unhealthy 19m kubelet Readiness probe failed: 2023-08-01 06:55:59.478 [INFO][363] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.0.1.1
Warning Unhealthy 18m kubelet Readiness probe failed: 2023-08-01 06:56:09.489 [INFO][431] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.0.1.1
Warning Unhealthy 18m kubelet Readiness probe failed: 2023-08-01 06:56:19.522 [INFO][491] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.0.1.1
Warning Unhealthy 18m kubelet Readiness probe failed: 2023-08-01 06:56:29.495 [INFO][541] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.0.1.1
Warning Unhealthy 18m kubelet Readiness probe failed: 2023-08-01 06:56:39.527 [INFO][609] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.0.1.1
Warning Unhealthy 18m kubelet Readiness probe failed: 2023-08-01 06:56:49.503 [INFO][656] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.0.1.1
Warning Unhealthy 18m kubelet Readiness probe failed: 2023-08-01 06:56:59.506 [INFO][746] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.0.1.1
Warning Unhealthy 4m32s (x93 over 17m) kubelet (combined from similar events): Readiness probe failed: 2023-08-01 07:10:29.486 [INFO][6011] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.0.1.1
create_multinode_cluster.stderr:
W0801 00:54:16.997669 30229 initconfiguration.go:119] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration!
I0801 00:54:17.209749 30229 version.go:256] remote version is much newer: v1.27.4; falling back to: stable-1.25
All nodes need to be joined in the cluster. Have you joined all nodes? (y/n): All nodes need to be joined in the cluster. Have you joined all nodes? (y/n): Warning: resource configmaps/kube-proxy is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
Error from server (InternalError): error when creating "/users/lkondras/vhive/configs/metallb/metallb-ipaddresspool.yaml": Internal error occurred: failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": context deadline exceeded
Error from server (InternalError): error when creating "/users/lkondras/vhive/configs/metallb/metallb-l2advertisement.yaml": Internal error occurred: failed calling webhook "l2advertisementvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s": context deadline exceeded
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
^M 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0^M100 101 100 101 0 0 848 0 --:--:-- --:--:-- --:--:-- 848
^M100 4899 100 4899 0 0 12036 0 --:--:-- --:--:-- --:--:-- 12036
! values.global.jwtPolicy is deprecated; use Values.global.jwtPolicy=third-party-jwt. See https://istio.io/latest/docs/ops/best-practices/security/#configure-third-party-service-account-tokens for more information instead
- Processing resources for Istio core.
✔ Istio core installed
- Processing resources for Istiod.
- Processing resources for Istiod. Waiting for Deployment/istio-system/istiod
✔ Istiod installed
- Processing resources for Ingress gateways.
- Processing resources for Ingress gateways. Waiting for Deployment/istio-system/cluster-local-gateway, Deployment/istio-system/istio-ingressgateway
✘ Ingress gateways encountered an error: failed to wait for resource: resources not ready after 5m0s: timed out waiting for the condition
Deployment/istio-system/cluster-local-gateway (containers with unready status: [istio-proxy])
Deployment/istio-system/istio-ingressgateway (containers with unready status: [istio-proxy])
- Pruning removed resourcesError: failed to install manifests: errors occurred during operation
Notes
Setup works for pure containerd deployment (stock-only passed to setup scripts).
The text was updated successfully, but these errors were encountered:
For xl170 nodes, the problem was fixed with sed -i '4548i\ - name: IP_AUTODETECTION_METHOD\n value: "interface=ens1f1"' configs/calico/canal.yaml before running ./scripts/cluster/create_multinode_cluster.sh on master node. So, the problem is in the choice of interface for calico.
Describe the bug
Firecracker-containerd deployment option is not working.
This shows itself as
calico
pods in non-ready state (ready 0/1 pods) and failed liveness probe. Some other pods are also not ready (default domain has not succeeded, registry is in constant CrashLoopBackOff)To Reproduce
Quickstart guide, firecracker deployment, cloudlab xl170 or d430 2-node cluster.
Commands:
Expected behavior
Have working setup: all pods are ready.
Logs
create_multinode_cluster.stderr
:Notes
Setup works for pure containerd deployment (
stock-only
passed to setup scripts).The text was updated successfully, but these errors were encountered: