A cromulent (watch) assistant for cromwell workflows run on the cloud (currently only Google Cloud Platform).
-
Estimate cost of a cromwell workflow. (Doesn't include network egress or sustained usage discounts. Not all resource types included.)
Cost is calculated by pricing the cpu, memory and disk usage of each Google Genomics Operation present in the cromwell metadata. The idea is based on comments made in the GATK Forum.
-
Quickly get workflow statuses
-
Easily retrieve current Google Compute Engine & Persistent Disk Costs via the Google Cloud Billing API
-
Easily retrieve workflow metadata from the cromwell server
- python 2.7
- Google Cloud Account
- gcloud
pip install git+https://github.com/hall-lab/cromulent.git@master
Additionally, you may need to authorize application default credentials via gcloud
before running cromulent
gcloud auth application-default login
gcloud auth application-default login
git clone https://github.com/hall-lab/cromulent
cd cromulent
virtualenv venv
source venv/bin/activate
pip install -e .
The main interface is the cromulent
terminal command. It has a git-like sub-command interface.
Try typing cromulent --help
on the command line and see what options are available.
$ cromulent --help
Usage: cromulent [OPTIONS] COMMAND [ARGS]...
A collection of cromwell helpers.
Configuration Set the "CROMULENT_CONFIG" environment variable to the HOCON
(JES) config file used by cromwell. Please see README for more details.
Options:
--version Show the version and exit.
-h, --help Show this message and exit.
Commands:
abort abort workflow
bq Inspect billing via BigQuery
estimate estimate ideal workflow cost
execution-status get workflow execution status
metadata retrieve metadata for workflow-id
metadata-lite retrieve abridge metadata for workflow-id (for large
workflows)
outputs metadata on inputs, outputs and status
sku-list retrieve sku pricing info from the Google Cloud API
sql directly query the cromwell database
status get workflow status
wf generate a workflow report
Each subcommand will have it own set of options. Try cromulent <subcommand> --help
for more details on each subcommand.
Some cromulent subcommands require a configuration file to run. For ease of use, cromulent uses the jes.conf file used by cromwell. This file is in the HOCON format, which is similar to JSON. Set the environment variable CROMULENT_CONFIG
to the file you would like to use. The typical use of the config file is to ascertain the location and connection parameters for the cromwell database. See the examples below for details of these parameters.
in cromwell's jes.conf
database {
db {
url = "jdbc:mysql://cromwell-mysql:3306/cromwell?rewriteBatchedStatements=true&useSSL=false"\n
user = "cromwell"
password = "words"
}
}
use your own HOCON formatted file
database {
db {
host = localhost
port = 3306
user = "root"
password = "words"
}
}
use your own HOCON formatted file
database {
db {
file = "/somewhere/db.sqlite"
}
}
The cromulent wf
subcommand contains various report types for actively running and completed cromwell workflows.
The cromulent wf
subcommand takes an --opts
parameter that is specialized for a given --report
option. The --opts
parameter must be in the following key/value command separated format:
--opts='key1=value1;key2=value2;...'
The opts
parameter can be used on certain report types. See the documentation and examples below.
This is a report to get an overall status of where things are with a given workflow. For large cromwell workflows it is probably convenient to cache the workflow metadata to a file via the cromulent metadata
command.
$ cromulent wf --metadata 45a3953a-052e-4aca-a3f1-51d313e01d99.json --report=summary
or
$ cromulent wf --workflow-id 45a3953a-052e-4aca-a3f1-51d313e01d99 --report=summary
$ cromulent wf --metadata 45a3953a-052e-4aca-a3f1-51d313e01d99.json --report=summary
[2018-11-22 17:00:12,967] : root : INFO : Loading the workflow metadata from : 45a3953a-052e-4aca-a3f1-51d313e01d99.json
ID : 45a3953a-052e-4aca-a3f1-51d313e01d99
Status : Failed
Submit Time: 2018-11-21T21:53:29.101Z (UTC)
Start Time: 2018-11-21T21:53:32.826Z (UTC)
End Time: 2018-11-22T18:51:48.954Z (UTC)
call Done Failed RetryableFailure
--------------------------------------------- ------ -------- ------------------
JointGenotyping.ImportGVCFs 719 9468 827
JointGenotyping.DynamicallyCombineIntervals 1 0 0
JointGenotyping.GenotypeGVCFs 719 0 9
JointGenotyping.HardFilterAndMakeSitesOnlyVcf 719 0 1
JointGenotyping.CollectGVCFs 10187 0 0
$ cromulent wf --report=failures --metadata=45a3953a-052e-4aca-a3f1-51d313e01d99.json
$ cromulent wf --report=failures --metadata=45a3953a-052e-4aca-a3f1-51d313e01d99.json --opts='detail=true'
$ cromulent wf --report=failures --metadata=45a3953a-052e-4aca-a3f1-51d313e01d99.json --opts='detail=true;calls=JointGenotyping.ImportGVCFs'
$ cromulent wf --report=failures --metadata=45a3953a-052e-4aca-a3f1-51d313e01d99.json --opts='detail=true;calls=JointGenotyping.ImportGVCFs;jobids=15664940324265826670,8797592820173599617'
Simple failure report output:
$ cromulent wf --report=failures --metadata=45a3953a-052e-4aca-a3f1-51d313e01d99.json | head -n 5
[2018-11-22 17:23:28,143] : root : INFO : Loading the workflow metadata from : 45a3953a-052e-4aca-a3f1-51d313e01d99.json
call shard jobId rc stderr
--------------------------- ------- ---------------------------------------------------------------------- ---- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
JointGenotyping.ImportGVCFs 0 projects/washu-genome-inh-dis-analysis/operations/8797592820173599617 1 gs://wustl-ccdg-costa-rican-callset-2018-11/cromwell/cromwell-executions/JointGenotyping/45a3953a-052e-4aca-a3f1-51d313e01d99/call-ImportGVCFs/shard-0/stderr
JointGenotyping.ImportGVCFs 1 projects/washu-genome-inh-dis-analysis/operations/12240229225160402087 1 gs://wustl-ccdg-costa-rican-callset-2018-11/cromwell/cromwell-executions/JointGenotyping/45a3953a-052e-4aca-a3f1-51d313e01d99/call-ImportGVCFs/shard-1/stderr
Detailed failure report output:
$ cromulent wf --report=failures --metadata=45a3953a-052e-4aca-a3f1-51d313e01d99.json --opts='detail=true' | head -n 60
[2018-11-22 17:18:14,064] : root : INFO : Loading the workflow metadata from : 45a3953a-052e-4aca-a3f1-51d313e01d99.json
call: JointGenotyping.ImportGVCFs | shard: 0 | jobId: projects/my-google-project-id/operations/8797592820173599617 | rc: 1
--- Error Message: ---
| Task JointGenotyping.ImportGVCFs:0:1 failed. Job exit code 1. Check gs://wustl-callset-bucket/cromwell/cromwell-executions/JointGenotyping/45a3953a-052e-4aca-a3f1-51d313e01d99/call-ImportGVCFs/shard-0/stderr for more information. PAPI error code 9. Execution failed: action 9: unexpected exit status 1 was not ignored
| [Delocalization] Unexpected exit status 1 while running "/bin/sh -c retry() { for i in `seq 3`; do gsutil cp /cromwell_root/genomicsdb.tar gs://wustl-callset-bucket/cromwell/cromwell-executions/JointGenotyping/45a3953a-052e-4aca-a3f1-51d313e01d99/call-ImportGVCFs/shard-0/ 2> gsutil_output.txt; RC_GSUTIL=$?; if [[ \"$RC_GSUTIL\" -eq 1 ]]; then\n grep \"Bucket is requester pays bucket but no user project provided.\" gsutil_output.txt && echo \"Retrying with user project\"; gsutil -u my-google-project-id cp /cromwell_root/genomicsdb.tar gs://wustl-callset-bucket/cromwell/cromwell-executions/JointGenotyping/45a3953a-052e-4aca-a3f1-51d313e01d99/call-ImportGVCFs/shard-0/; fi ; RC=$?; if [[ \"$RC\" -eq 0 ]]; then break; fi; sleep 5; done; return \"$RC\"; }; retry": CommandException: No URLs matched: /cromwell_root/genomicsdb.tar
| CommandException: No URLs matched: /cromwell_root/genomicsdb.tar
| CommandException: No URLs matched: /cromwell_root/genomicsdb.tar
|
--- Inputs: ---
{
"batch_size": 50,
"disk_size": 200,
"docker": "us.gcr.io/my-google-project-id/gatk-4:4.0.6.0",
"gatk_path": "/gatk/gatk",
"interval": "chr1:1-391754",
"sample_name_map": "gs://wustl-callset-bucket/cromwell/cromwell-executions/JointGenotyping/45a3953a-052e-4aca-a3f1-51d313e01d99/call-CollectGVCFs/shard-0/gvcf.sample_map",
"workspace_dir_name": "genomicsdb"
}
--- stderr: ---
gs://wustl-callset-bucket/cromwell/cromwell-executions/JointGenotyping/45a3953a-052e-4aca-a3f1-51d313e01d99/call-ImportGVCFs/shard-0/stderr
--- jes: ---
{
"endpointUrl": "https://genomics.googleapis.com/",
"executionBucket": "gs://wustl-callset-bucket/cromwell/cromwell-executions",
"googleProject": "my-google-project-id",
"instanceName": "google-pipelines-worker-25baaac3f73cd72fd20af6a00fb7d438",
"machineType": "custom-2-6912",
"monitoringScript": "gs://wustl-monitoring-bucket/mem_monitor.sh",
"zone": "us-central1-f"
}
--- runtime: ---
{
"bootDiskSizeGb": "10",
"continueOnReturnCode": "0",
"cpu": "2",
"cpuMin": "1",
"disks": "local-disk 200 HDD",
"docker": "us.gcr.io/my-google-project-id/gatk-4:4.0.6.0",
"failOnStderr": "false",
"maxRetries": "0",
"memory": "7000.0 MB",
"memoryMin": "2048.0 MB",
"noAddress": "false",
"preemptible": "5",
"zones": "us-central1-a,us-central1-b,us-central1-c,us-central1-f"
}
call: JointGenotyping.ImportGVCFs | shard: 1 | jobId: projects/my-google-project-id/operations/12240229225160402087 | rc: 1
--- Error Message: ---
...
These are the options that be used on the --opts
option parameter for the failures
report.
-
detail=true
This will turn on the detailed failure report format. It will contain error message, input, output and execution details for a given job. By default, this is turned off, and only the basic report is displayed.
-
calls=<wf.call_name1>,<wf.call_name2>,...
Report on only selected calls. The selected calls are delimited by a
,
. For example,calls=JointGenotyping.ImportGVCFs
will only produce the failures report for just theJointGenotyping.ImportGVCFs
task call of the corresponding workflow WDL file. -
jobids=<id-text>,<id-text>,...
Report on only selected google genomics job IDs. The job IDs are delimited by a
,
. For examplejobids=15664940324265826670,8797592820173599617
will only produce the detail report for the job IDs containing the text15664940324265826670
or8797592820173599617
.
NOTE: This sofware is currently in alpha stage development, and is continously changing. Newer subcommands and features are currently in development. Existing subcommands may be modified, moved, or removed entirely.