Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some pods not working properly with containerd runtime and CNI plugin. #911

Closed
spothound opened this issue May 3, 2022 · 4 comments
Closed

Comments

@spothound
Copy link

spothound commented May 3, 2022

What happened:
I am updating a development environment to use containerd runtime as a test before updating my production environments to this runtime. Sadly, although the cluster update works and I can run some pods with containerd... there are some core pods that are not working properly after the update. For example:

  1. Cert-manager service has the following error:
error retrieving resource lock kube-system/cert-manager-controller: Get "https://172.20.0.1:443/api/v1/namespaces/kube-system/configmaps/cert-manager-controller": dial tcp 172.20.0.1:443: i/o timeout
  1. kubed service has the following error:
Error: Get "https://172.20.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication": dial tcp 172.20.0.1:443: i/o timeout

  1. nginx ingress controller has the following error:
│ W0503 11:08:48.909243       7 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This mig not work.│
│ I0503 11:08:48.909416       7 main.go:221] "Creating API client" host="https://172.20.0.1:443" 

So... seems like there is internal networking in the cluster when trying to run containerd runtime.

What you expected to happen:

The mentioned services should work with containerd as they did with docker runtime... and the mentioned timeout errors should not happen.

How to reproduce it (as minimally and precisely as possible):
Create an EKs cluster with docker runtime and CNI plugin and try to run some of the mentioned services, then update the runtime to containerd by adding the --container-runtime containerd flag to the cluster bootstrap command.

Anything else we need to know?:

Environment:

  • AWS Region: eu-west1
  • Instance Type(s): m5a.xlarge
  • EKS Platform version: eks.5
  • Kubernetes version: 1.21
  • AMI Version: AMI amazon-eks-ami AMI Release v20220429
  • Kernel: 5.4.188-104.359.amzn2.x86_64
  • Release information (run cat /etc/eks/release on a node):
BASE_AMI_ID="ami-040678b7c3f67e60e"
BUILD_TIME="Fri Apr 29 00:19:03 UTC 2022"
BUILD_KERNEL="5.4.188-104.359.amzn2.x86_64"
ARCH="x86_64

I have seen some closed issues in this topic:

But the info extracted from those didn't help. I have checked the nodes and they have a SL:

lrwxrwxrwx 1 root root 31 May  3 08:42 /run/dockershim.sock -> /run/containerd/containerd.sock

So it shouldn't be the same problem that in those issues... but maybe is something related.

Any idea, or suggestion on how to debug the issue?

Thanks!

@chroche
Copy link

chroche commented Mar 3, 2023

Hi @spothound

can you please comment on how you got this fixed?

Thanks

@yongzhang
Copy link

Same issue, how did you fix this?

@harshitp1987
Copy link

Same issue when adopting EKS 1.25 with custom AMI , how was this resolved?

@Th0masL
Copy link

Th0masL commented Jul 14, 2023

This might not be relevant to your problem, but I was running more or less into a similar network connectivity issue for pods, and it was due to IP Fowarding being disabled.

See my post in this issue for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants