Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues arising from InsufficientInstanceCapacity on aws region eu-west-1 #814

Closed
ivyleavedtoadflax opened this issue Nov 18, 2021 · 4 comments · Fixed by #831
Closed

Issues arising from InsufficientInstanceCapacity on aws region eu-west-1 #814

ivyleavedtoadflax opened this issue Nov 18, 2021 · 4 comments · Fixed by #831
Labels
cloud-aws Amazon Web Services cml-runner Subcommand

Comments

@ivyleavedtoadflax
Copy link

ivyleavedtoadflax commented Nov 18, 2021

These aren't particularly CML issues per-se but it would be good to surface a discussion about them.
Recently when i try to spawn a GPU instance via CML in AWS region eu-west-1 I get the following error:

EC2: RunInstances, exceeded maximum number of attempts, 3, https response error StatusCode: 
500, RequestID: 87b3947e-25a6-4d7a-8e3d-ab8246ca0e3d, api error InsufficientInstanceCapacity: 
We currently do not have sufficient g4dn.xlarge capacity in the Availability Zone you requested (eu-west-1b). 
Our system will be working on provisioning additional capacity. 
You can currently get g4dn.xlarge capacity by not specifying an Availability Zone in your request or choosing eu-west-1a, eu-west-1c

Ok, all well and good, this is nothing particularly to do with CML, but is it possible though following the advise ☝️ to set the availability zone manually (afaik its only possible to set the region at the moment)?

The second issue is that when i run into this issue when request a spot instance I get the following error:

CancelSpotInstanceRequests, https response error StatusCode: 403, RequestID: 94aa8f89-628c-4d79-98aa-3c009154f184, api error UnauthorizedOperation: You are not authorized to perform this operation.
Run cml-runner \
  cml-runner \
  --cloud aws \
  --cloud-region eu-west \
  --cloud-type=g4dn.xlarge \
  --cloud-spot true \
  --labels=cml-gpu \
  --idle-timeout 60
  shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
  env:
    REPO_TOKEN: ***
    AWS_ACCESS_KEY_ID: ***
    AWS_SECRET_ACCESS_KEY: ***
{"level":"warn","message":"ignoring RUNNER_NAME environment variable, use CML_RUNNER_NAME or --name instead"}
{"level":"info","message":"Preparing workdir /home/runner/.cml/cml-qkl1k90164..."}
{"level":"info","message":"Deploying cloud runner plan..."}
{"level":"info","message":"Terraform apply..."}
{"level":"error","message":"terraform -chdir='/home/runner/.cml/cml-qkl1k90164' apply -auto-approve\n\t\nTerraform used the selected providers to generate the following execution\nplan. Resource actions are indicated with the following symbols:\n  \u001b[32m+\u001b[0m create\n\u001b[0m\nTerraform will perform the following actions:\n\n\u001b[1m  # iterative_cml_runner.runner\u001b[0m will be created\u001b[0m\u001b[0m\n\u001b[0m  \u001b[32m+\u001b[0m\u001b[0m resource \"iterative_cml_runner\" \"runner\" {\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mcloud\u001b[0m\u001b[0m                = \"aws\"\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mdriver\u001b[0m\u001b[0m               = \"github\"\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mid\u001b[0m\u001b[0m                   = (known after apply)\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0midle_timeout\u001b[0m\u001b[0m         = 60\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mimage\u001b[0m\u001b[0m                = (known after apply)\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0minstance_hdd_size\u001b[0m\u001b[0m    = 35\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0minstance_ip\u001b[0m\u001b[0m          = (known after apply)\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0minstance_launch_time\u001b[0m\u001b[0m = (known after apply)\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0minstance_type\u001b[0m\u001b[0m        = \"g4dn.xlarge\"\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mlabels\u001b[0m\u001b[0m               = \"cml-gpu\"\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mname\u001b[0m\u001b[0m                 = \"cml-qkl1k90164\"\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mregion\u001b[0m\u001b[0m               = \"eu-west\"\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mrepo\u001b[0m\u001b[0m                 = \"<redacted>"\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0msingle\u001b[0m\u001b[0m               = false\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mspot\u001b[0m\u001b[0m                 = true\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mspot_price\u001b[0m\u001b[0m           = -1\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mssh_public\u001b[0m\u001b[0m           = (known after apply)\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mtoken\u001b[0m\u001b[0m                = (sensitive value)\n    }\n\n\u001b[0m\u001b[1mPlan:\u001b[0m 1 to add, 0 to change, 0 to destroy.\n\u001b[0m\u001b[0m\u001b[1miterative_cml_runner.runner: Creating...\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [1m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [1m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [1m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [1m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [1m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [1m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [2m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [2m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [2m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [2m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [2m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [2m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [3m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [3m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [3m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [3m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [3m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [3m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [4m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [4m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [4m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [4m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [4m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [4m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [5m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [5m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [5m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [5m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [5m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [5m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [6m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [6m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [6m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [6m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [6m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [6m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [7m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [7m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [7m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [7m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [7m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [7m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [8m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [8m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [8m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [8m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [8m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [8m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [9m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [9m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [9m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [9m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [9m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [9m50s elapsed]\u001b[0m\u001b[0m\n\n\t\u001b[31m╷\u001b[0m\u001b[0m\n\u001b[31m│\u001b[0m \u001b[0m\u001b[1m\u001b[31mError: \u001b[0m\u001b[0m\u001b[1mFailed creating the machine: operation error EC2: CancelSpotInstanceRequests, https response error StatusCode: 403, RequestID: 94aa8f89-628c-4d79-98aa-3c009154f184, api error UnauthorizedOperation: You are not authorized to perform this operation. Authorization failure message: '{\"allowed\":false,\"explicitDeny\":false,\"matchedStatements\":{\"items\":[]},\"failures\":{\"items\":[]},\"context\":{\"principal\":{\"id\":\"AIDA32BN7M437QCNQ3PLB\",\"name\":\"cml.user\",\"arn\":\"arn:aws:iam::811845248823:user/cml.user\"},\"action\":\"ec2:CancelSpotInstanceRequests\",\"resource\":\"arn:aws:ec2:eu-west-1:811845248823:spot-instances-request/sir-e9kyabbg\",\"conditions\":{\"items\":[{\"key\":\"811845248823:Name\",\"values\":{\"items\":[{\"value\":\"cml-qkl1k90164\"}]}},{\"key\":\"aws:Resource\",\"values\":{\"items\":[{\"value\":\"spot-instances-request/sir-e9kyabbg\"}]}},{\"key\":\"aws:Account\",\"values\":{\"items\":[{\"value\":\"811845248823\"}]}},{\"key\":\"ec2:ResourceTag/Name\",\"values\":{\"items\":[{\"value\":\"cml-qkl1k90164\"}]}},{\"key\":\"aws:Region\",\"values\":{\"items\":[{\"value\":\"eu-west-1\"}]}},{\"key\":\"aws:ID\",\"values\":{\"items\":[{\"value\":\"sir-e9kyabbg\"}]}},{\"key\":\"aws:Service\",\"values\":{\"items\":[{\"value\":\"ec2\"}]}},{\"key\":\"ec2:ResourceTag/Id\",\"values\":{\"items\":[{\"value\":\"iterative-1ggntxww7ltvr\"}]}},{\"key\":\"aws:Type\",\"values\":{\"items\":[{\"value\":\"spot-instances-request\"}]}},{\"key\":\"811845248823:Id\",\"values\":{\"items\":[{\"value\":\"iterative-1ggntxww7ltvr\"}]}},{\"key\":\"ec2:Region\",\"values\":{\"items\":[{\"value\":\"eu-west-1\"}]}},{\"key\":\"aws:ARN\",\"values\":{\"items\":[{\"value\":\"arn:aws:ec2:eu-west-1:811845248823:spot-instances-request/sir-e9kyabbg\"}]}}]}}}'\u001b[0m\n\u001b[31m│\u001b[0m \u001b[0m\n\u001b[31m│\u001b[0m \u001b[0m\u001b[0m  with iterative_cml_runner.runner,\n\u001b[31m│\u001b[0m \u001b[0m  on main.tf line 14, in resource \"iterative_cml_runner\" \"runner\":\n\u001b[31m│\u001b[0m \u001b[0m  14: resource \"iterative_cml_runner\" \"runner\" \u001b[4m{\u001b[0m\u001b[0m\n\u001b[31m│\u001b[0m \u001b[0m\n\u001b[31m╵\u001b[0m\u001b[0m\n","stack":"Error: terraform -chdir='/home/runner/.cml/cml-qkl1k90164' apply -auto-approve\n\t\nTerraform used the selected providers to generate the following execution\nplan. Resource actions are indicated with the following symbols:\n  \u001b[32m+\u001b[0m create\n\u001b[0m\nTerraform will perform the following actions:\n\n\u001b[1m  # iterative_cml_runner.runner\u001b[0m will be created\u001b[0m\u001b[0m\n\u001b[0m  \u001b[32m+\u001b[0m\u001b[0m resource \"iterative_cml_runner\" \"runner\" {\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mcloud\u001b[0m\u001b[0m                = \"aws\"\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mdriver\u001b[0m\u001b[0m               = \"github\"\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mid\u001b[0m\u001b[0m                   = (known after apply)\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0midle_timeout\u001b[0m\u001b[0m         = 60\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mimage\u001b[0m\u001b[0m                = (known after apply)\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0minstance_hdd_size\u001b[0m\u001b[0m    = 35\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0minstance_ip\u001b[0m\u001b[0m          = (known after apply)\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0minstance_launch_time\u001b[0m\u001b[0m = (known after apply)\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0minstance_type\u001b[0m\u001b[0m        = \"g4dn.xlarge\"\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mlabels\u001b[0m\u001b[0m               = \"cml-gpu\"\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mname\u001b[0m\u001b[0m                 = \"cml-qkl1k90164\"\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mregion\u001b[0m\u001b[0m               = \"eu-west\"\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mrepo\u001b[0m\u001b[0m                 = \"<redacted>"\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0msingle\u001b[0m\u001b[0m               = false\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mspot\u001b[0m\u001b[0m                 = true\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mspot_price\u001b[0m\u001b[0m           = -1\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mssh_public\u001b[0m\u001b[0m           = (known after apply)\n      \u001b[32m+\u001b[0m \u001b[0m\u001b[1m\u001b[0mtoken\u001b[0m\u001b[0m                = (sensitive value)\n    }\n\n\u001b[0m\u001b[1mPlan:\u001b[0m 1 to add, 0 to change, 0 to destroy.\n\u001b[0m\u001b[0m\u001b[1miterative_cml_runner.runner: Creating...\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [1m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [1m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [1m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [1m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [1m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [1m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [2m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [2m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [2m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [2m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [2m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [2m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [3m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [3m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [3m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [3m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [3m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [3m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [4m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [4m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [4m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [4m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [4m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [4m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [5m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [5m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [5m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [5m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [5m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [5m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [6m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [6m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [6m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [6m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [6m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [6m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [7m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [7m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [7m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [7m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [7m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [7m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [8m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [8m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [8m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [8m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [8m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [8m50s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [9m0s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [9m10s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [9m20s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [9m30s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [9m40s elapsed]\u001b[0m\u001b[0m\n\u001b[0m\u001b[1miterative_cml_runner.runner: Still creating... [9m50s elapsed]\u001b[0m\u001b[0m\n\n\t\u001b[31m╷\u001b[0m\u001b[0m\n\u001b[31m│\u001b[0m \u001b[0m\u001b[1m\u001b[31mError: \u001b[0m\u001b[0m\u001b[1mFailed creating the machine: operation error EC2: CancelSpotInstanceRequests, https response error StatusCode: 403, RequestID: 94aa8f89-628c-4d79-98aa-3c009154f184, api error UnauthorizedOperation: You are not authorized to perform this operation. Authorization failure message: '{\"allowed\":false,\"explicitDeny\":false,\"matchedStatements\":{\"items\":[]},\"failures\":{\"items\":[]},\"context\":{\"principal\":{\"id\":\"AIDA32BN7M437QCNQ3PLB\",\"name\":\"cml.user\",\"arn\":\"arn:aws:iam::811845248823:user/cml.user\"},\"action\":\"ec2:CancelSpotInstanceRequests\",\"resource\":\"arn:aws:ec2:eu-west-1:811845248823:spot-instances-request/sir-e9kyabbg\",\"conditions\":{\"items\":[{\"key\":\"811845248823:Name\",\"values\":{\"items\":[{\"value\":\"cml-qkl1k90164\"}]}},{\"key\":\"aws:Resource\",\"values\":{\"items\":[{\"value\":\"spot-instances-request/sir-e9kyabbg\"}]}},{\"key\":\"aws:Account\",\"values\":{\"items\":[{\"value\":\"811845248823\"}]}},{\"key\":\"ec2:ResourceTag/Name\",\"values\":{\"items\":[{\"value\":\"cml-qkl1k90164\"}]}},{\"key\":\"aws:Region\",\"values\":{\"items\":[{\"value\":\"eu-west-1\"}]}},{\"key\":\"aws:ID\",\"values\":{\"items\":[{\"value\":\"sir-e9kyabbg\"}]}},{\"key\":\"aws:Service\",\"values\":{\"items\":[{\"value\":\"ec2\"}]}},{\"key\":\"ec2:ResourceTag/Id\",\"values\":{\"items\":[{\"value\":\"iterative-1ggntxww7ltvr\"}]}},{\"key\":\"aws:Type\",\"values\":{\"items\":[{\"value\":\"spot-instances-request\"}]}},{\"key\":\"811845248823:Id\",\"values\":{\"items\":[{\"value\":\"iterative-1ggntxww7ltvr\"}]}},{\"key\":\"ec2:Region\",\"values\":{\"items\":[{\"value\":\"eu-west-1\"}]}},{\"key\":\"aws:ARN\",\"values\":{\"items\":[{\"value\":\"arn:aws:ec2:eu-west-1:811845248823:spot-instances-request/sir-e9kyabbg\"}]}}]}}}'\u001b[0m\n\u001b[31m│\u001b[0m \u001b[0m\n\u001b[31m│\u001b[0m \u001b[0m\u001b[0m  with iterative_cml_runner.runner,\n\u001b[31m│\u001b[0m \u001b[0m  on main.tf line 14, in resource \"iterative_cml_runner\" \"runner\":\n\u001b[31m│\u001b[0m \u001b[0m  14: resource \"iterative_cml_runner\" \"runner\" \u001b[4m{\u001b[0m\u001b[0m\n\u001b[31m│\u001b[0m \u001b[0m\n\u001b[31m╵\u001b[0m\u001b[0m\n\n    at /usr/local/lib/node_modules/@dvcorg/cml/src/utils.js:14:27\n    at ChildProcess.exithandler (child_process.js:390:5)\n    at ChildProcess.emit (events.js:400:28)\n    at maybeClose (internal/child_process.js:1058:16)\n    at Process.ChildProcess._handle.onexit (internal/child_process.js:293:5)","status":"terminated"}
{"level":"info","message":"waiting 20 seconds before exiting..."}

As I understand it here: CML is waiting ten minute for an instance to be available, once that fails, it tries to remove the Spot request, which fails because in this case it doesn't have permissions. Was there any update to #429 as a reference for permissions, perhaps this needs to include ec2CancelSpotInstanceRequests?

@ivyleavedtoadflax ivyleavedtoadflax changed the title Issues arising from InsufficientInstanceCapacity Issues arising from InsufficientInstanceCapacity on aws region eu-west-1 Nov 18, 2021
@dacbd
Copy link
Contributor

dacbd commented Nov 18, 2021

Related #795 both involve the logic in the subnet selection portion of TPI

@ivyleavedtoadflax TPI does make a CancelSpotInstanceRequests call this may have been from an update in TPI going to the AWS SDK v2 or just missed from before.

this is a quick list I put together before:

ec2:AuthorizeSecurityGroupEgress    - Create firewall rule
ec2:AuthorizeSecurityGroupIngress   - Create firewall rule
ec2:CancelSpotInstanceRequests      - if using spot
ec2:CreateSecurityGroup             - Create virtual firewall rule to allow ssh
ec2:CreateTags                      - resource management
ec2:DeleteKeyPair                   - clean up created ssh key
ec2:DescribeImages                  - Searching from CML providied AMIs
ec2:DescribeInstances               - Select created instance
ec2:DescribeSecurityGroups          - Select Security Group
ec2:DescribeSpotInstanceRequests    - if using spot
ec2:DescribeSubnets                 - Select Subnet for instance
ec2:DescribeVpcs                    - Assigning instance to correct VPC
ec2:ImportKeyPair                   - Creating SSH connection for server setup
ec2:RequestSpotInstances            - if using spot
ec2:RunInstances                    - Start created instance
ec2:TerminateInstances              - tear down when done

Also, error logging is another example of #806

@ivyleavedtoadflax
Copy link
Author

Thanks @dacbd that's helpful

@dacbd
Copy link
Contributor

dacbd commented Dec 6, 2021

Should Close with the release of iterative/terraform-provider-iterative#323

@casperdcl casperdcl linked a pull request Dec 7, 2021 that will close this issue
@ivyleavedtoadflax
Copy link
Author

@nsorros @pdan93

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloud-aws Amazon Web Services cml-runner Subcommand
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants