Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws-eks] Occasional failures when creating k8s resources during cluster update (needs AWS CLI upgrade) #6279

Closed
stefanolczak opened this issue Feb 14, 2020 · 7 comments
Assignees
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. p1

Comments

@stefanolczak
Copy link

stefanolczak commented Feb 14, 2020

After #5540 is merged there is still a possibility to fail when creating EKS cluster with added kubernetes resources. It doesn't happens always. The problem is in kubectl-handler python lambdas which use "aws eks update-kubeconfig" as a way to "log in" to the kubernetes cluster. The command can fail with output "Cluster status not active" which is not retried and results in fail of whole stack creation.

Reproduction Steps

Create EKS stack and add kubernetes resources to it by using methods like addResources()

Error Log

Cluster status not active
[ERROR] CalledProcessError: Command '['aws', 'eks', 'update-kubeconfig', '--role-arn', 'arn:aws:iam::XXXXXXX:role/sandbox-eks-cluster-SandboxEksClusterCreationRole4-M2X105VQRJ8B', '--name', 'SandboxEksCluster', '--kubeconfig', '/tmp/kubeconfig']' returned non-zero exit status 255.
Traceback (most recent call last):
  File "/var/task/index.py", line 13, in handler
    return apply_handler(event, context)
  File "/var/task/apply/__init__.py", line 30, in apply_handler
    '--kubeconfig', kubeconfig
  File "/var/lang/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)

Environment

  • CLI Version : 1.22
  • Framework Version:
  • OS :
  • Language :

Other


This is 🐛 Bug Report

@stefanolczak stefanolczak added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Feb 14, 2020
@SomayaB SomayaB added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Feb 14, 2020
@SomayaB SomayaB removed the needs-triage This issue or PR still needs to be triaged. label Feb 14, 2020
@eladb
Copy link
Contributor

eladb commented Feb 16, 2020

Thanks for reporting

@eladb
Copy link
Contributor

eladb commented Mar 10, 2020

Any chance you can grab cloudwatch logs from the resource provider so I can have a bit more visibility into the issue?

@stefanolczak
Copy link
Author

Any chance you can grab cloudwatch logs from the resource provider so I can have a bit more visibility into the issue?

@eladb Do you mean this?

2020-02-14T11:12:45.968Z	INFO	[provider-framework] user function response: {
  "StatusCode": 200,
  "FunctionError": "Unhandled",
  "ExecutedVersion": "$LATEST",
  "Payload": "{\"errorMessage\": \"Command '['aws', 'eks', 'update-kubeconfig', '--role-arn', 'arn:aws:iam::XXXX:role/sandbox-eks-cluster-SandboxEksClusterCreationRole4-M2X105VQRJ8B', '--name', 'SandboxEksCluster', '--kubeconfig', '/tmp/kubeconfig']' returned non-zero exit status 255.\", \"errorType\": \"CalledProcessError\", \"stackTrace\": [\"  File \\\"/var/task/index.py\\\", line 13, in handler\\n    return apply_handler(event, context)\\n\", \"  File \\\"/var/task/apply/__init__.py\\\", line 30, in apply_handler\\n    '--kubeconfig', kubeconfig\\n\", \"  File \\\"/var/lang/lib/python3.7/subprocess.py\\\", line 363, in check_call\\n    raise CalledProcessError(retcode, cmd)\\n\"]}"
} object
2020-02-14T11:12:45.968Z	INFO	[provider-framework] user function threw an error: Unhandled
2020-02-14T11:12:45.968Z	INFO	[provider-framework] CREATE failed, responding with a marker physical resource id so that the subsequent DELETE will be ignored
2020-02-14T11:12:45.983Z	INFO	[provider-framework] submit response to cloudformation {
  "Status": "FAILED",
  "Reason": "Error: Command '['aws', 'eks', 'update-kubeconfig', '--role-arn', 'arn:aws:iam::XXXX:role/sandbox-eks-cluster-SandboxEksClusterCreationRole4-M2X105VQRJ8B', '--name', 'SandboxEksCluster', '--kubeconfig', '/tmp/kubeconfig']' returned non-zero exit status 255.\n    at invokeUserFunction (/var/task/framework.js:85:19)\n    at process._tickCallback (internal/process/next_tick.js:68:7)",
  "StackId": "arn:aws:cloudformation:us-east-1:XXXX:stack/sandbox-eks-cluster/887e2800-4f18-11ea-9548-127e9735111e",
  "RequestId": "ZZZZ",
  "PhysicalResourceId": "AWSCDK::CustomResourceProviderFramework::CREATE_FAILED",
  "LogicalResourceId": "SandboxEksClustermanifestKibanaIngress07330142"
}

@eladb
Copy link
Contributor

eladb commented Mar 10, 2020

Thanks, seems like this is related: aws/aws-cli#3914

The will fail update-kubeconfig if the cluster status is UPDATING.

@stefanolczak
Copy link
Author

@eladb Do you plan to release a fix for it?

@eladb
Copy link
Contributor

eladb commented Jun 24, 2020

This is fixed by this commit and released in AWS CLI 1.18.70.

@pahud the aws-eks I can see that latest version of the kubectl layer (2.0.0-beta3) currently uses AWS CLI 1.18.37. What would it take to release a new version with the latest CLI and update the aws-eks module to use it?

@eladb eladb changed the title EKS fails to create with added kubernetes resources Occasional failures when creating k8s resources during cluster updates: requires AWS CLI upgrade Jun 24, 2020
@eladb eladb changed the title Occasional failures when creating k8s resources during cluster updates: requires AWS CLI upgrade Occasional failures when creating k8s resources during cluster update (needs AWS CLI upgrade) Jun 24, 2020
@eladb eladb added this to the EKS Developer Preview milestone Jun 24, 2020
@eladb eladb changed the title Occasional failures when creating k8s resources during cluster update (needs AWS CLI upgrade) [EKS Bug] Occasional failures when creating k8s resources during cluster update (needs AWS CLI upgrade) Jun 24, 2020
@eladb
Copy link
Contributor

eladb commented Jun 24, 2020

Should be resolved by #7216

@eladb eladb closed this as completed Jun 24, 2020
@iliapolo iliapolo modified the milestones: EKS Developer Preview, EKS Dev Preview Aug 10, 2020
@iliapolo iliapolo changed the title [EKS Bug] Occasional failures when creating k8s resources during cluster update (needs AWS CLI upgrade) [aws-eks] Occasional failures when creating k8s resources during cluster update (needs AWS CLI upgrade) Aug 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. p1
Projects
None yet
Development

No branches or pull requests

5 participants