Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest AMI version lowers ulimit and breaks Elasticsearch #193

Closed
jacobwgillespie opened this issue Feb 20, 2019 · 10 comments
Closed

Latest AMI version lowers ulimit and breaks Elasticsearch #193

jacobwgillespie opened this issue Feb 20, 2019 · 10 comments

Comments

@jacobwgillespie
Copy link

What happened:

Upgrading to the latest AMI reduced the ulimit to 8192 which broke Elasticsearch deployments. Elasticsearch requires a ulimit of 65536.

What you expected to happen:

Upgrading AMIs shouldn't reduce the ulimit (which is the opposite effect of what the PR #186 implies).

How to reproduce it (as minimally and precisely as possible):

Use the latest AMI amazon-eks-node-1.11-v20190220 - containers will have a max ulimit of 8192, not unlimited or 65536.

Anything else we need to know?:

This was broken by #186 - that PR reduced the default limit, rather than increasing it.

Environment:

  • AWS Region: us-east-1
  • Instance Type(s): m5.xlarge
  • AMI Version: amazon-eks-node-1.11-v20190220
kekoav added a commit to kekoav/amazon-eks-ami that referenced this issue Feb 25, 2019
This fixes awslabs#193. This change retains the explicit limits via docker daemon config file, but increases them to 65536.
@ryan-a-baker
Copy link

I did some extensive testing, and it looks like the docker version packaged with the previous AMI didn't actually apply the configuration from /etc/sysconfig/docker. If you look at the docker process, it does not have the default limits applied to the process (from the previous AMI):

root 3931 1 1 Feb16 ? 02:32:14 /usr/bin/dockerd

Here it is on the new AMI:

root 3786 1 3 15:41 ? 00:00:56 /usr/bin/dockerd --default-ulimit nofile=1024:4096

I think the change assumed that it was being applied, which if true, would have increased the limit. However, since it wasn't it reduced it.

@jgoeres
Copy link

jgoeres commented Mar 12, 2019

Ran into the same issue with the  amazon-eks-node-1.11-v20190220 image in eu-central-1. Switching back to  amazon-eks-node-1.11-v20190211 seems to fix this.

@morganchristiansson
Copy link

morganchristiansson commented Mar 21, 2019

Ran into the same issue and tried using v20190211 per @jgoeres but nodes didn't join and were giving authentication from kubernetes whereas latest ami works.

I realised that I don't need to run elasticsearch on EKS yet and will try again later and hopefully the eks ami will be fixed.

@gnydick
Copy link

gnydick commented Mar 28, 2019

this is broken again, what was the last version that worked?

@whereisaaron
Copy link

@gnydick there are now k8s 1.11.9 and 1.12.7 worker AMIs available you could test and see if they have the ulimit revert patch.
https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html

@gnydick
Copy link

gnydick commented Mar 29, 2019 via email

@whereisaaron
Copy link

The problem that originally reduced ulimits was introduced by #186 on 14 Feb and reverted by #206 on 26 Feb. If you are having problems with builds before/after that date range, it is probably a different issue.

The patch to reverse the ulimit reduction was merged before well before the 1.11.9 and 1.12.7 worker images were built, and @tabern commented in #206 said that the revert would be in these new 1.11 images. And you can inspect the docker-daemon.json file to confirm that.

You might need to start a new issue and investigation the cause if you still have problems @gnydick.

@whereisaaron
Copy link

@gnydick #234 this looks to be the issue now, and why the simple revert of #186 didn't work. The bootstrap script attempts to delete the default options but apparently fails. That could be fixed, or there is #205 which would override the default options.

@miyoda
Copy link

miyoda commented Apr 1, 2019

Solved with the last ami of: https://docs.aws.amazon.com/eks/latest/userguide/launch-workers.html

@arielb135
Copy link

arielb135 commented Jun 4, 2019

Solved with the last ami of: https://docs.aws.amazon.com/eks/latest/userguide/launch-workers.html

I've upgraded to ami-0f6f3929a9d7a418e (i have kubernetes 1.10) - the issue still exists, any help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants