-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add tga cnvkit to gens #1448
Conversation
At the moment the pipeline is working for the TGA workflows but i haven't verified all workflows yet. So at the moment we could just view this review as a code-review. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## update_cnvkit_pons #1448 +/- ##
===================================================
Coverage 99.48% 99.49%
===================================================
Files 40 40
Lines 1960 1976 +16
===================================================
+ Hits 1950 1966 +16
Misses 10 10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…SAMIC into update_cnvkit_pons
…SAMIC into update_cnvkit_pons
…SAMIC into update_cnvkit_pons
…SAMIC into update_cnvkit_pons
GENS has previously only been activated for WGS in Balsamic, however with the inclusion of this PR into production Clinical-Genomics/BALSAMIC#1448 CNV and BAF profiles from TGA samples can be uploaded as well. This feature is planned for release 16.0.0 of Balsamic (somewhere around **August maybe**) and requires a couple of small changes in CG. ### Added - gnomad-af argument to TGA samples ### Changed - gens upload no longer filters out TGA samples
Quality Gate passedIssues Measures |
#### Added - UMI extraction and deduplication to TGA workflow - Adapter trimming of fastqs to UMI workflow - Cap base quality in bam for Manta input #### Changed - Refactored multi workflow rule-files to separate files to decrease complexity - Refactored output files to in general comply with format {sample_type}.{sample_name} - Replaced Picard QC tools with matching Sentieon QC tools #### Removed - UMI specific rules for UMI-extraction and alignment (using new TGA-rules instead) - Fastq and UMI trimming command-line options Merged this PR into this one: #1465 #### Added - Added extension of target bed regions to a minimum size of 100 for CNV analysis - PON for: Exome comprehensive 10.2 - PON for: GMSsolid 15.2 - PON for: GMCKsolid 4.2 #### Changed - updated PON for GMCKSolid v4.1 - updated PON for GMSMyeloid v5.3 - updated PON for GMSlymphoid v7.3 Merged this PR into this one: #1448 #### Added - Script to post-process CNVkit output to GENS-format - DNAscope gnomad calling to TGA for GENS #### Changed - Parsing of GENS arguments changed to account for TGA Merged this PR: #1475 into this one #### Changed - Refactored rules for bcftools filters - Renamed final UMI bamfile to ensure hsmetrics are collected in multiqc json - Changed ranked VCF from research to clincial - Lowered min AF for TGA from 0.007 to 0.005 - Lowered maximal SOR for TNscope in TGA tumor only cases from 3 to 2.7 - Changed filter settings for research TNscope vcf, now either PASS or triallelic_site (fixing this issue: #1293) #### Added - TNscope for TGA workflows, merged with VarDict results - New filter for VarDict for tumor in normal contamination - Export TMP environment variables to rules that lack them - Added genmod ranked VCFs to be delivered - Added family-id to genmod in order to get ranked variants to Scout (solved this: #1045) - Added DP and AF to INFO-field of TNscope vcfs for ranking model - Raw TNscope calls and unfiltered research-annotated SNVs to delivery #### Removed - ML-model for TNscope is removed due to license issue with new version of Sentieon - All code associated with TNhaplotyper - Removed research.filtered.pass VCFs from delivery and storage list
Description
This PR adds post-processing steps to CNVkit results from TGA to facilitate upload to GENS, which has previously only been possible for WGS via post-processing of the GATK CollectReadCounts output.
As the gnomad vcf is required as well for the creation of the BAF visualisation track in GENS the config and the GENS rule assignment has been modified to make it possible to use of these rules and references in TGA as well.
And additional little script was added to massage the CNVkit file tumor.merged.cnr into a GENS accepted format with different resolutions.
This PR closes this issue:
#1385
Open question to discuss: purity adjusted log2 coverage values
I have in this GENS post-processing also decided to take as input the tumor-purity from PureCN to modify the log2 coverage values to make the fold-changes more visible in low-purity samples. I don't know if this is recommended, however CNVs in low purity samples would be quite difficult to observe without it.
Conclusion to this: It was too difficult to ensure that this purity and ploidy was used correctly to adjust the values. And it was decided to give the customers the raw values for now.
This change requires further changes in CG
We need 2 changes as far as I can tell at the moment:
PR in CG: Clinical-Genomics/cg#3361
Added
Changed
Documentation
Tests
Feature Tests
Pipeline Integrity Tests
.hk
file)Clinical Genomics Stockholm
Documentation
User Changes
Infrastructure Changes
Checklist
Important
Ensure that all checkboxes below are ticked before merging.
For Developers
For Reviewers
conditions where applicable, with satisfactory results.