Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bootstrap.sh can fail in cluster launch when waiting on iptables lock #400

Closed
jpvowen opened this issue Jan 21, 2020 · 0 comments
Closed

Comments

@jpvowen
Copy link
Contributor

jpvowen commented Jan 21, 2020

What happened:
We use unmanaged EKS node groups for CloudFormation, with some customizations to the template to do some logging/alerting from outside Kubernetes. We've found that bootstrap.sh can fail due to an iptables lock, if other processes are also affecting the iptables configuration.

Specifically, when bootstrap.sh calls systemctl start kubelet.service, we run into this line in kubelet.service

ExecStartPre=/sbin/iptables -P FORWARD ACCEPT

This command fails if something else has the iptables lock, and we can see the following in journalctl:

Jan 21 22:55:50 ip-10-5-20-65.ec2.internal systemd[1]: Starting Kubernetes Kubelet...
Jan 21 22:55:50 ip-10-5-20-65.ec2.internal iptables[4335]: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Jan 21 22:55:50 ip-10-5-20-65.ec2.internal systemd[1]: kubelet.service: control process exited, code=exited status=4
Jan 21 22:55:50 ip-10-5-20-65.ec2.internal cloud-init[3651]: + true
Jan 21 22:55:50 ip-10-5-20-65.ec2.internal systemd[1]: Failed to start Kubernetes Kubelet.
Jan 21 22:55:50 ip-10-5-20-65.ec2.internal systemd[1]: Unit kubelet.service entered failed state.
Jan 21 22:55:50 ip-10-5-20-65.ec2.internal systemd[1]: kubelet.service failed.

What you expected to happen:
kubelet.service can use iptables' -w flag to wait for an iptables lock instead of failing immediately

How to reproduce it (as minimally and precisely as possible):
I've attached
amazon-eks-nodegroup-modified.txt, which modifies the us-east-1 EKS stack template slightly:

  • It runs a background process to hammer the iptables lock before running bootstrap.sh
  • It causes the CF stack to fail if any of the workers send a failure signal.

You should be able to reproduce by just setting up an EKS nodegroup using the template

Anything else we need to know?:

Environment:

  • AWS Region: us-east-1
  • Instance Type(s): any
  • EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion):
  • Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version):
  • AMI Version:
  • Kernel (e.g. uname -a):
  • Release information (run cat /etc/eks/release on a node):
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants