-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KM GVS documentation #7903
KM GVS documentation #7903
Conversation
scripts/variantstore/gvs-overview.md
Outdated
| :----: | :---: | :----: | :--------------: | | ||
| [GvsJointVariantCalling](https://github.com/broadinstitute/gatk/blob/rc-vs-483-beta-user-wdl/scripts/variantstore/wdl/GvsJointVariantCalling.wdl) | June, 2022 | [Kaylee Mathews](mailto:kmathews@broadinstitute.org) and [Aurora Cremer](mailto:aurora@broadinstitute.org) | If you have questions or feedback, contact the [Broad Variants team](mailto:variants@broadinstitute.org) | | ||
|
||
![Diagram depicting the Broad Genomic Variant Store workflow. Sample GVCF files are imported into the core data model. A filtering model is trained using Variant Quality Score Recalibration, or VQSR, and then used to extract cohorts and produce sharded joint VCF files. Each step integrates BigQuery and GATK tools.](/scripts/variantstore/genomic-variant-store_diagram.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a placeholder figure for now until we have time to create one with a little more detail.
Codecov Report
@@ Coverage Diff @@
## ah_var_store #7903 +/- ##
================================================
Coverage ? 84.757%
Complexity ? 34663
================================================
Files ? 2170
Lines ? 164888
Branches ? 17786
================================================
Hits ? 139754
Misses ? 18943
Partials ? 6191 |
scripts/variantstore/gvs-overview.md
Outdated
The [GvsCreateAltAllele subworkflow (alias = CreateAltAllele)](https://github.com/broadinstitute/gatk/blob/ah_var_store/scripts/variantstore/wdl/GvsCreateAltAllele.wdl) splits alternate alleles and calculates additional annotations to be used for filtering. GvsCreateAltAllele imports an additional workflow, [GvsUtils (alias = Utils)](https://github.com/broadinstitute/gatk/blob/ah_var_store/scripts/variantstore/wdl/GvsUtils.wdl). | ||
|
||
#### B. GvsCreateFilterSet | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the output of this task? The next section talks about VCFs, but we've been working with GVCF up till this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another table in the dataset -- not something the user needs to understand, but it's the output of this that we use to create the filter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the first part of the BQ only part of the workflow. We are taking data from one table (or several, if the number of samples is >4k) and moving it into another table (partially for data access patterns!!!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified this section significantly so I don't think your question is answered in the doc @ekiernan. However, I hope by simplifying it, it makes it seem less like something users need to know the inner workings of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice to see this being created!
At a high level, I think we can make it more concise and approachable for users. It's a bit wordy in places, and we also describe details that won't matter to users or they won't know what to make of it. We can try reading it with the eyes of a completely new person to Terra/GVS (who knows genomics/gvcfs/etc)
It might also be helpful to separate it out into 3 docs?
- Overview of GVS - what it is, how the parts work at a high/conceptual level. Good start to this at the top
- QuickStart
- Running On your own samples
As a new user, it seems to bounce back and forth between all three of these and if I have one goal in mind I get a bit lost. E.g. if I want to just try it out on supplied data (quick start) I might get confused about all the talk with required annotations, scale limitations, etc. But if I'm moving on to using my own samples I def want to see that stuff as pre-requisites
* Added gvs-quickstart.md * Updated bucket permissions in run-your-own-sampled.md
* updated setup instructions
Co-authored-by: Bec Asch <rsasch@users.noreply.github.com>
Co-authored-by: Kylee Degatano <kdegatano@broadinstitute.org>
@RoriCremer and I created these docs to provide information about the workflow to beta users and walk them through the steps of running the workflow in the beta workspace on example data as well as their own sample data.