-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WX-1625 Quota retry #7439
WX-1625 Quota retry #7439
Changes from 16 commits
83e702b
12f7a07
4560c1c
74210d8
5e0bf4f
b99bf2c
f51dca4
eb9e50d
ee80093
381a6b5
c6817f3
91241e6
cb55d19
4730e98
95ab655
716b4e9
0ecc2cf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
name: quota_fail_retry | ||
testFormat: workflowfailure | ||
backends: [Papiv2] | ||
|
||
files { | ||
workflow: quota_fail_retry/quota_fail_retry.wdl | ||
} | ||
|
||
# Adapted from `preemptible_and_memory_retry.test`. | ||
# I set `broad-dsde-cromwell-dev` to have super low CPU quota in `us-west3` (Salt Lake City) for this test | ||
# This functionality is pretty married to PAPI, it doesn't run on `GCPBatch` backend. | ||
|
||
metadata { | ||
workflowName: sleepy_sleep | ||
status: Failed | ||
"failures.0.message": "Workflow failed" | ||
"failures.0.causedBy.0.message": "Task sleepy_sleep.sleep:NA:3 failed. The job was stopped before the command finished. PAPI error code 9. Could not start instance custom-12-11264 due to insufficient quota. Cromwell retries exhausted, task failed. Backend info: Execution failed: allocating: selecting resources: selecting region and zone: no available zones: us-west3: 12 CPUS (10/10 available) quota too low" | ||
"sleepy_sleep.sleep.-1.1.executionStatus": "RetryableFailure" | ||
"sleepy_sleep.sleep.-1.2.executionStatus": "RetryableFailure" | ||
"sleepy_sleep.sleep.-1.3.executionStatus": "Failed" | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
version 1.0 | ||
|
||
workflow sleepy_sleep { | ||
|
||
input { | ||
Int sleep_seconds = 180 | ||
} | ||
|
||
call sleep { | ||
input: sleep_seconds = sleep_seconds | ||
} | ||
|
||
} | ||
|
||
task sleep { | ||
|
||
input { | ||
Int sleep_seconds | ||
} | ||
|
||
meta { | ||
volatile: true | ||
} | ||
|
||
# I set `broad-dsde-cromwell-dev` to have super low CPU quota in `us-west3` (Salt Lake City) for this test | ||
runtime { | ||
cpu: 12 | ||
docker: "ubuntu:latest" | ||
zones: "us-west3-a us-west3-b us-west3-c" | ||
} | ||
|
||
command <<< | ||
sleep ~{sleep_seconds}; | ||
ls -la | ||
>>> | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,6 +3,8 @@ root = "gs://cloud-cromwell-dev-self-cleaning/cromwell_execution/ci" | |
maximum-polling-interval = 600 | ||
concurrent-job-limit = 1000 | ||
|
||
quota-attempts: 3 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, nope, good call. This is a leftover from when I was spamming the key everywhere because I couldn't get it to read and thought maybe my instance was reading the wrong config. |
||
|
||
batch { | ||
auth = "service_account" | ||
location = "us-central1" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had previously done the same thing for
us-west3
when designingAwaitingCloudQuota