Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add description/troubleshooting for aws backend #137

Merged
merged 1 commit into from
Aug 6, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions scripts/aws_caper_server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=gwfcor
- `S3 Bucket name`: S3 bucket name to store your pipeline outputs. This is not a full path for the output directory. It's just bucket's name without the scheme prefix `s3://`. Make sure that this bucket doesn't exist. If it exists then delete it or try with a different non-existing bucket name.
- `VPC ID`: Choose the VPC `GenomicsVPC` that you just created.
- `VPC Subnet IDs`: Choose all private subnets created with the above VPC.
- `Max vCPUs for Default Queue`: Maximum total number of CPUs for the spot instance queue. It's 4000 by default, which is huge already. But if you use more CPUs than this limit then your jobs will be stuck at `RUNNABLE` status.
- `Max vCPUs for Priority Queue`: Maximum total number of CPUs for the on-demand instance queue. It's 4000 by default, which is huge already. But if you use more CPUs than this limit then your jobs will be stuck at `RUNNABLE` status.
3. Click on `Next` and then `Next` again. Agree to `Capabililties`. Click on `Create stack`.
4. Go to your [AWS Batch](https://console.aws.amazon.com/batch) and click on `Job queues` in the left sidebar. You will see two Job Queues (`priority-*` and `default-*`). There has been some issues with the default one which is based on spot instances. Spot instances are interrupted quite often and Cromwell doesn't seem to handle it properly. We recommend to use `priority-*` queue even though it costs a bit more than spot instances. Click on the chosen job queue and get ARN of it. This ARN will be used later to create Caper server instance.

Expand Down
5 changes: 5 additions & 0 deletions scripts/aws_caper_server/TROUBLESHOOTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,8 @@ If you use S3 URIs in an input JSON which are in a different region, then you wi
### `S3Exception: null (Service: S3, Status Code: 400)`

If you see `400` error then please use this shell script `./create_instance.sh` to create an instance instead of running Caper server on your laptop/machine.


### Tasks (jobs) are stuck at RUNNABLE status

Go to `Job Queues` in `AWS Batch` on your AWS console and find your job queue (default or priority) that matches with the ARN in your Caper conf. Edit the queue and increase number of maximum vCPUs.