Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stack troubleshooter should do a better job #491

Closed
errordeveloper opened this issue Jan 30, 2019 · 9 comments
Closed

stack troubleshooter should do a better job #491

errordeveloper opened this issue Jan 30, 2019 · 9 comments
Labels
area/general-cli help wanted Extra attention is needed kind/feature New feature or request stale

Comments

@errordeveloper
Copy link
Contributor

errordeveloper commented Jan 30, 2019

Based on what @kelseyhightower pointed out:

  • it can hide too many non critical message (user can view those with eksctl utils describe-stacks, if they wish to do so)
  • common errors can be recognised and summaries in a simpler way, e.g: limits, validation errors, etc
@errordeveloper
Copy link
Contributor Author

errordeveloper commented Apr 2, 2019

Common examples:

  • 2019-04-02T05:26:51Z [✖] AWS::EC2::EIP/NATIP: CREATE_FAILED – "The maximum number of addresses has been reached. (Service: AmazonEC2; Status Code: 400; Error Code: AddressLimitExceeded; Request ID: 6db57162-60f3-45cc-aad6-d0ea321116e1)"
  • number of VPCs

So in the case of EIP, we see this:

2019-04-02T05:26:51Z [✖]  AWS::EC2::InternetGateway/InternetGateway: CREATE_FAILED – "Resource creation cancelled"
2019-04-02T05:26:51Z [✖]  AWS::IAM::Role/ServiceRole: CREATE_FAILED – "Resource creation cancelled"
2019-04-02T05:26:51Z [✖]  AWS::EC2::VPC/VPC: CREATE_FAILED – "Resource creation cancelled"
2019-04-02T05:26:51Z [✖]  AWS::EC2::EIP/NATIP: CREATE_FAILED – "The maximum number of addresses has been reached. (Service: AmazonEC2; Status Code: 400; Error Code: AddressLimitExceeded; Request ID: 6db57162-60f3-45cc-aad6-d0ea321116e1)"

We can look for CREATE_FAILED status, and there Resource creation cancelled reasons that can be recognised easily, and combine with a more unique reason for AWS::EC2::EIP/NATIP.

I believe AWS::EC2::VPC/VPC failure due to overall VPC limit can be detected fairly easily too.

@errordeveloper
Copy link
Contributor Author

errordeveloper commented May 13, 2019

See also: #560, #792, #716.

@errordeveloper
Copy link
Contributor Author

@mhausenblas would you be interested to have a go at addressing some of these cases, e.g. VPC limits and instance type cases?

@murali-reddy
Copy link

@errordeveloper I am not very clear what is the suggested improvement.

common errors can be recognised and summaries in a simpler way, e.g: limits, validation errors, etc

Are you suggesting that, eksctl should give more user-friendly (for those who does not know cloudformation etc) and more exact error message without going through eksctl utils describe-stacks?

@errordeveloper
Copy link
Contributor Author

errordeveloper commented Jun 25, 2019

@murali-reddy yes, exactly.

So, for example, if we are not in verbose mode, and we have detected the following event:

[✖]  AWS::EC2::EIP/NATIP: CREATE_FAILED – "The maximum number of addresses has been reached. (Service: AmazonEC2; Status Code: 400; Error Code: AddressLimitExceeded; Request ID: 6db57162-60f3-45cc-aad6-d0ea321116e1)"

We should just print something like this instead:

[✖] Unable to create cluster "<name>" in "<region>" as the limit of Elastic IPs (EIPs) was reached, at least one EIP is required for NAT gateway

We can also suggest disabling NAT gateway (which is now possible), but let's start simple for now. As that will need to be a link to the docs that haven't landed yet.

To begin with, I think we should cover the following:

  • EKS cluster limit
  • VPC limit
  • NAT Gateway limit
  • EIP limit

Feel free to pick any one of these for the first PR :)

@errordeveloper
Copy link
Contributor Author

I just noticed this:

[✖]  AWS::EC2::VPC/VPC: CREATE_FAILED – "The maximum number of VPCs has been reached. (Service: AmazonEC2; Status Code: 400; Error Code: VpcLimitExceeded; Request ID: e0ad4fcd-d892-49ec-b074-9bb7f111439f)"
[✖]  AWS::EC2::InternetGateway/InternetGateway: CREATE_FAILED – "The maximum number of internet gateways has been reached. (Service: AmazonEC2; Status Code: 400; Error Code: InternetGatewayLimitExceeded; Request ID: 4438845d-4764-41f9-941a-f85a13cee4c0)"

So it's possible to hit some of these limits at the same time, I though you'd only get to know about whichever you hit first... Something to keep in mind, we certainly want to tell users about all limits they hit.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Feb 27, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2021

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as completed Mar 4, 2021
torredil pushed a commit to torredil/eksctl that referenced this issue May 20, 2022
…-rbac

Improve csi-snapshotter ClusterRole
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/general-cli help wanted Extra attention is needed kind/feature New feature or request stale
Projects
None yet
Development

No branches or pull requests

3 participants