Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ah [VS-565] output intervals and sample list #8010

Merged
Merged
11 changes: 3 additions & 8 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,6 @@ workflows:
branches:
- master
- ah_var_store
- rsa_sum_python
- name: GvsRescatterCallsetInterval
subclass: WDL
primaryDescriptorPath: /scripts/variantstore/wdl/GvsRescatterCallsetInterval.wdl
Expand All @@ -103,7 +102,6 @@ workflows:
branches:
- master
- ah_var_store
- rsa_vs_265_batch_alt_allele
- name: GvsCreateTables
subclass: WDL
primaryDescriptorPath: /scripts/variantstore/wdl/GvsCreateTables.wdl
Expand All @@ -120,15 +118,14 @@ workflows:
branches:
- master
- ah_var_store
- rsa_vs_607_drop_state
- ah_vs_565_output_intervals_and_sample_list
- name: GvsImportGenomes
subclass: WDL
primaryDescriptorPath: /scripts/variantstore/wdl/GvsImportGenomes.wdl
filters:
branches:
- master
- ah_var_store
- rsa_vs_607_drop_state
- name: GvsPrepareRangesCallset
subclass: WDL
primaryDescriptorPath: /scripts/variantstore/wdl/GvsPrepareRangesCallset.wdl
Expand Down Expand Up @@ -170,7 +167,6 @@ workflows:
branches:
- master
- ah_var_store
- rsa_vs_607_drop_state
- name: GvsWithdrawSamples
subclass: WDL
primaryDescriptorPath: /scripts/variantstore/wdl/GvsWithdrawSamples.wdl
Expand All @@ -185,14 +181,15 @@ workflows:
branches:
- master
- ah_var_store
- rsa_vs_607_drop_state
- ah_vs_565_output_intervals_and_sample_list
- name: GvsJointVariantCalling
subclass: WDL
primaryDescriptorPath: /scripts/variantstore/wdl/GvsJointVariantCalling.wdl
filters:
branches:
- master
- ah_var_store
- ah_vs_565_output_intervals_and_sample_list
- name: GvsJointVariantCallsetCost
subclass: WDL
primaryDescriptorPath: /scripts/variantstore/wdl/GvsJointVariantCallsetCost.wdl
Expand All @@ -215,7 +212,6 @@ workflows:
branches:
- master
- ah_var_store
- rsa_vs_607_drop_state
- name: GvsIngestTieout
subclass: WDL
primaryDescriptorPath: /scripts/variantstore/wdl/GvsIngestTieout.wdl
Expand All @@ -238,7 +234,6 @@ workflows:
branches:
- master
- ah_var_store
- vs_605_hail_codegen
- name: MitochondriaPipeline
subclass: WDL
primaryDescriptorPath: /scripts/mitochondria_m2_wdl/MitochondriaPipeline.wdl
Expand Down
34 changes: 4 additions & 30 deletions scripts/variantstore/beta_docs/run-your-own-samples.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,13 @@ Input GVCF files for the GVS workflow must include the annotations described in

The following files are stored in the workspace Google bucket and links to the files are written to the sample_set data table:

- sharded joint VCF files and index files
- sharded joint VCF files, index files, the interval lists produced for this run, and the sample name list
- size of output VCF files in MB
- manifest file containing the output destination of additional files and other metadata

Note:
The interval lists are named consistently with the vcfs: 00000000.vcf.gz.interval-list will go with 00000000.vcf.gz and 00000000.vcf.gz.tbi

## Setup

Before you can begin uploading your data to Terra, you’ll need to setup some accounts and permissions that will allow Terra to access your data and use BigQuery to run the workflow. Follow the step-by-step instructions in [GVS Beta Quickstart](./gvs-quickstart.md).
Expand Down Expand Up @@ -138,35 +141,6 @@ If you run the workflow with the example data pre-loaded into the workspace, and

By default, the workflow is set up to write outputs to the workspace Google bucket. If you want to write the outputs to a different cloud storage location, you can specify the cloud path in the `extract_output_gcs_dir` optional input in the workflow configuration.

Accessing the interval lists for all of the output VCF shards:
Start by clicking into your job from the job history tab. This page for your job will have a table with metadata from your job in it (such as workflow ID and the Run Cost) and the table will have a Links column.
The icons/links in that column have additional job information.
* Job Manager
* Workflow Dashboard
* Execution directory


There are two ways to get the interval lists from here. You can either use the UI in the Job Manager, or navigate for the correct path from the Execution directory.

#### Using the Job Manager:
If you prefer to use the job manager, click on the job manager link from the links column. From the List View tab, click into the GvsUnified page. This page's List View tab will show a list of all the sub-workflows in your job. From there, click into the GvsExtractCallset sub-workflow page. This page's List View tab will show a list of all tasks in the extract sub-workflow. You are interested in the SplitIntervals task which should be the 8th task. In the SplitIntervals row, find the icon in the outputs column and click on it. A popover will load, which will contain an array of the interval_files. These are the paths for the interval lists.
If the popover does not load, you can click into the SplitIntervals Execution directory (which is in the Links column of this table).

#### Using the Execution directory:
Your jobs Execution directory contains artifacts from your job, including logs and the interval lists.
You can access the interval lists by navigating to the directory for the SplitIntervals task inside the GvsExtractCallset sub-workflow.
Once you have clicked on the link to the Execution directory, it will drop you into your workflow bucket.
From there, click into the call-GvsUnified --> GvsUnified --> <unified job id (this will be the only option, so no lookup is needed)>
Then select the extract sub-workflow: call-GvsExtractCallset --> GvsExtractCallset --> <extract job id (only option--no lookup needed)>
Finally, select the SplitIntervals task: call-SplitIntervals and find the glob-<> directory. Inside that directory are the interval lists for the call set.

The interval list paths will look something like this:
`gs://<workspace bucket id>/<submission id>/GvsJointVariantCalling/<workflow id>/call-GvsUnified/GvsUnified/<unified job id (only option--no lookup needed)>/call-GvsExtractCallset/GvsExtractCallset/<extract job id (only option--no lookup needed)>/call-SplitIntervals/glob-<task id>/0000000000-<callset id>.vcf.gz.interval_list`

There will also be a `glob-<task id>` file with a list of interval lists, but not their paths.

Note:
The interval lists are named consistently with the vcfs: 00000000.vcf.gz.interval-list will go with 00000000.vcf.gz and 00000000.vcf.gz.tbi

### Time and cost
Below are several examples of the time and cost of running the workflow.
Expand Down
2 changes: 2 additions & 0 deletions scripts/variantstore/wdl/GvsExtractCallset.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -199,8 +199,10 @@ workflow GvsExtractCallset {
output {
Array[File] output_vcfs = ExtractTask.output_vcf
Array[File] output_vcf_indexes = ExtractTask.output_vcf_index
Array[File] output_vcf_interval_files = SplitIntervals.interval_files
Float total_vcfs_size_mb = SumBytes.total_mb
File manifest = CreateManifest.manifest
File? sample_name_list = GenerateSampleListFile.sample_name_list
Boolean done = true
}
}
Expand Down
2 changes: 2 additions & 0 deletions scripts/variantstore/wdl/GvsJointVariantCalling.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,9 @@ workflow GvsJointVariantCalling {
output {
Array[File] output_vcfs = GvsUnified.output_vcfs
Array[File] output_vcf_indexes = GvsUnified.output_vcf_indexes
Array[File] output_vcf_interval_files = GvsUnified.output_vcf_interval_files
Float total_vcfs_size_mb = GvsUnified.total_vcfs_size_mb
File? sample_name_list = GvsUnified.sample_name_list
File manifest = GvsUnified.manifest
}
}
2 changes: 2 additions & 0 deletions scripts/variantstore/wdl/GvsUnified.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,8 @@ workflow GvsUnified {
Array[File] output_vcfs = GvsExtractCallset.output_vcfs
Array[File] output_vcf_indexes = GvsExtractCallset.output_vcf_indexes
Float total_vcfs_size_mb = GvsExtractCallset.total_vcfs_size_mb
Array[File] output_vcf_interval_files = GvsExtractCallset.output_vcf_interval_files
File? sample_name_list = GvsExtractCallset.sample_name_list
File manifest = GvsExtractCallset.manifest
Boolean done = true
}
Expand Down