-
Notifications
You must be signed in to change notification settings - Fork 133
Workaround for crashing Elasticsearch 6.x under EKS. #430
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of also doing something like --node-ami ${AMI_ID}
to avoid someone copy-pasting wrong AMI ID, but I am not insisting on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
bors r+ |
👎 Rejected by PR status |
IMO we should instead ask users to SSH into their EKS cluster nodes and remove the ulimits configuration of the docker daemon. This can be achieved with (tested): sudo sed -i '/"nofile": {/,/}/d' /etc/docker/daemon.json
sudo systemctl restart docker This workaround can go in the troubleshooting doc instead of the quickstart guide. edit: The above sed command turns {
"bridge": "none",
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "10"
},
"live-restore": true,
"max-concurrent-downloads": 10,
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Soft": 2048,
"Hard": 8192
}
}
} to {
"bridge": "none",
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "10"
},
"live-restore": true,
"max-concurrent-downloads": 10,
"default-ulimits": {
}
} , which is essentially the change in awslabs/amazon-eks-ami#206 |
@sameersbn in my opinion, we should make things as easy as possible for users, and adding Also, if the change is going to be reverted soon, what is the point of adding this documentation to the troubleshooting section? Instead, we can keep it in the Quickstart and once Amazon fixes the issue, we can remove this block of documentation. @arapulido what do you think? |
This workaround only works for newly created clusters. User with an existing EKS cluster will not be able to take the path of this workaround. |
That's true. What I managed to do is create a new nodegroup with proper AMI and then I think It was a painful process and mongodb that I had installed for my kubeapps could not be moved for some reason - but I just uninstalled and reinstalled kubeapps. |
I think we should have both. Pointing to a previous AMI in the quickstart (and revert this once it is fixed), and a troubleshooting section to manually change the ulimits that can stay (for people who already have a EKS cluster) |
@sameersbn please take another look, as I've added also a section for this issue in the troubleshooting section. |
docs/troubleshooting.md
Outdated
|
||
## Troubleshooting Elasticsearch | ||
|
||
### Elasticsearch crashloop under EKS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be following the pattern used in the rest of the document where we first state the issue that's observed (i.e. Elasticsearch enters a crashloop) and then include a Troubleshooting section which would walk the user to confirm that the issue is due to the ulimits (i.e. inspect the logs) and then suggest using the sed
command to resolve the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about now?
lgtm! minor typos to be resolved |
bors r+ |
bors r+ |
Build succeeded |
No description provided.