Workaround for crashing Elasticsearch 6.x under EKS. #430

falfaro · 2019-03-12T15:25:28Z

No description provided.

docs/quickstart-eks.md

wojciechka

I was thinking of also doing something like --node-ami ${AMI_ID} to avoid someone copy-pasting wrong AMI ID, but I am not insisting on it.

arapulido

LGTM

falfaro · 2019-03-12T16:05:08Z

bors r+

bors · 2019-03-12T16:05:26Z

👎 Rejected by PR status

sameersbn · 2019-03-13T04:14:38Z

IMO we should instead ask users to SSH into their EKS cluster nodes and remove the ulimits configuration of the docker daemon. This can be achieved with (tested):

sudo sed -i '/"nofile": {/,/}/d' /etc/docker/daemon.json
sudo systemctl restart docker

This workaround can go in the troubleshooting doc instead of the quickstart guide.

edit:

The above sed command turns

{
  "bridge": "none",
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "10"
  },
  "live-restore": true,
  "max-concurrent-downloads": 10,
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Soft": 2048,
      "Hard": 8192
    }
  }
}

to

{
  "bridge": "none",
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "10"
  },
  "live-restore": true,
  "max-concurrent-downloads": 10,
  "default-ulimits": {
  }
}

, which is essentially the change in awslabs/amazon-eks-ami#206

falfaro · 2019-03-13T07:09:14Z

@sameersbn in my opinion, we should make things as easy as possible for users, and adding --node-ami to tke eksctl command-line is way easier than SSH-ing into N machines, changing the Docker configuration, etc.

Also, if the change is going to be reverted soon, what is the point of adding this documentation to the troubleshooting section? Instead, we can keep it in the Quickstart and once Amazon fixes the issue, we can remove this block of documentation.

@arapulido what do you think?

sameersbn · 2019-03-13T07:37:58Z

This workaround only works for newly created clusters. User with an existing EKS cluster will not be able to take the path of this workaround.

wojciechka · 2019-03-13T08:22:32Z

This workaround only works for newly created clusters. User with an existing EKS cluster will not be able to take the path of this workaround.

That's true. What I managed to do is create a new nodegroup with proper AMI and then kubectl drain all my old nodes, then deleting nodegroup.

I think eksctl even helps with that - i.e. eksctl-io/eksctl#592

It was a painful process and mongodb that I had installed for my kubeapps could not be moved for some reason - but I just uninstalled and reinstalled kubeapps.

arapulido · 2019-03-13T08:54:44Z

I think we should have both. Pointing to a previous AMI in the quickstart (and revert this once it is fixed), and a troubleshooting section to manually change the ulimits that can stay (for people who already have a EKS cluster)

docs/quickstart-eks.md

falfaro · 2019-03-14T09:29:35Z

@sameersbn please take another look, as I've added also a section for this issue in the troubleshooting section.

sameersbn · 2019-03-14T09:56:54Z

docs/troubleshooting.md

+
+## Troubleshooting Elasticsearch
+
+### Elasticsearch crashloop under EKS


We should be following the pattern used in the rest of the document where we first state the issue that's observed (i.e. Elasticsearch enters a crashloop) and then include a Troubleshooting section which would walk the user to confirm that the issue is due to the ulimits (i.e. inspect the logs) and then suggest using the sed command to resolve the issue.

What about now?

docs/troubleshooting.md

sameersbn · 2019-03-14T10:30:06Z

lgtm! minor typos to be resolved

falfaro · 2019-03-14T10:34:49Z

bors r+

430: Workaround for crashing Elasticsearch 6.x under EKS. r=falfaro a=falfaro Co-authored-by: Felipe Alfaro Solana <felipe.alfaro@gmail.com>

falfaro · 2019-03-14T11:52:43Z

bors r+

430: Workaround for crashing Elasticsearch 6.x under EKS. r=falfaro a=falfaro Co-authored-by: Felipe Alfaro Solana <felipe.alfaro@gmail.com>

bors · 2019-03-14T12:59:10Z

Build succeeded

continuous-integration/jenkins/branch

falfaro added the bug Something isn't working label Mar 12, 2019

falfaro self-assigned this Mar 12, 2019

falfaro requested review from wojciechka and arapulido March 12, 2019 15:25

wojciechka reviewed Mar 12, 2019

View reviewed changes

docs/quickstart-eks.md Outdated Show resolved Hide resolved

falfaro force-pushed the eks branch from 298b03d to f43c28b Compare March 12, 2019 15:51

wojciechka approved these changes Mar 12, 2019

View reviewed changes

arapulido approved these changes Mar 12, 2019

View reviewed changes

sameersbn added the status/on-hold label Mar 13, 2019

anguslees reviewed Mar 14, 2019

View reviewed changes

docs/quickstart-eks.md Show resolved Hide resolved

sameersbn removed the status/on-hold label Mar 14, 2019

falfaro force-pushed the eks branch from b17bd19 to 85ec27d Compare March 14, 2019 09:05

sameersbn suggested changes Mar 14, 2019

View reviewed changes

docs/troubleshooting.md Outdated Show resolved Hide resolved

docs/troubleshooting.md Outdated Show resolved Hide resolved

sameersbn approved these changes Mar 14, 2019

View reviewed changes

Workaround for crashing Elasticsearch 6.x under EKS.

435fb66

falfaro force-pushed the eks branch from e7db93c to 435fb66 Compare March 14, 2019 10:34

bors bot added a commit that referenced this pull request Mar 14, 2019

Merge #430

1ec9bcd

430: Workaround for crashing Elasticsearch 6.x under EKS. r=falfaro a=falfaro Co-authored-by: Felipe Alfaro Solana <felipe.alfaro@gmail.com>

bors bot added a commit that referenced this pull request Mar 14, 2019

Merge #430

5a725a5

430: Workaround for crashing Elasticsearch 6.x under EKS. r=falfaro a=falfaro Co-authored-by: Felipe Alfaro Solana <felipe.alfaro@gmail.com>

bors bot merged commit 435fb66 into master Mar 14, 2019

falfaro deleted the eks branch March 15, 2019 15:25

This was referenced Mar 20, 2019

Elasticsearch 6.x problems under EKS #427

Closed

Improve EKS quickstart to prevent issues with OAuth for Prometheus and Grafana #424

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workaround for crashing Elasticsearch 6.x under EKS. #430

Workaround for crashing Elasticsearch 6.x under EKS. #430

falfaro commented Mar 12, 2019

wojciechka left a comment

arapulido left a comment

falfaro commented Mar 12, 2019

bors bot commented Mar 12, 2019

sameersbn commented Mar 13, 2019 •

edited

Loading

falfaro commented Mar 13, 2019

sameersbn commented Mar 13, 2019

wojciechka commented Mar 13, 2019

arapulido commented Mar 13, 2019

falfaro commented Mar 14, 2019

sameersbn Mar 14, 2019 •

edited

Loading

falfaro Mar 14, 2019

sameersbn commented Mar 14, 2019

falfaro commented Mar 14, 2019

falfaro commented Mar 14, 2019

bors bot commented Mar 14, 2019


		## Troubleshooting Elasticsearch

		### Elasticsearch crashloop under EKS

Workaround for crashing Elasticsearch 6.x under EKS. #430

Workaround for crashing Elasticsearch 6.x under EKS. #430

Conversation

falfaro commented Mar 12, 2019

wojciechka left a comment

Choose a reason for hiding this comment

arapulido left a comment

Choose a reason for hiding this comment

falfaro commented Mar 12, 2019

bors bot commented Mar 12, 2019

sameersbn commented Mar 13, 2019 • edited Loading

falfaro commented Mar 13, 2019

sameersbn commented Mar 13, 2019

wojciechka commented Mar 13, 2019

arapulido commented Mar 13, 2019

falfaro commented Mar 14, 2019

sameersbn Mar 14, 2019 • edited Loading

Choose a reason for hiding this comment

falfaro Mar 14, 2019

Choose a reason for hiding this comment

sameersbn commented Mar 14, 2019

falfaro commented Mar 14, 2019

falfaro commented Mar 14, 2019

bors bot commented Mar 14, 2019

Build succeeded

sameersbn commented Mar 13, 2019 •

edited

Loading

sameersbn Mar 14, 2019 •

edited

Loading