-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Initial implementation of Molecular bar codes handling using AGeNT #462
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One quick question, otherwise looks good to me.
# Input fastqs are passed through snakemake.params. | ||
# snakemake.input is a .done file touched after linking files in. | ||
input_left = snakemake.params.args["input"]["reads_left"] | ||
input_right = snakemake.params.args["input"].get("reads_right", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using get
for one input and [ ]
for the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the first is always present, and should raise an exception if not, while the second is optional, and with the empty list as default.
…on for the whole somatic pipeline)
TODO: 1. rename the ugly 'mcbs' to 'somatic' or 'accurate' 2. implement the 'extra_args' in the mapping tools to enable mapper control parameters specific for barcodes (-C) 3. make the mapping operation generic (not restricted to bwa & bwa-mem2) 4. implement umi_tools for barcodes/umis processing 5. rename bqsr statistics so they can be collected by multiqc
Prototype implementation of mapping data generated with Molecular BarCodes (MBC) or UMIs.
Background
The MBCs are typically used on FFPE data, where library complexity may be low. These data are also often compromised by FFPE or oxo-G artifacts which require careful analysis & filtration of somatic variants. Base-quality re-calibration (BQSR) should be used in these cases.
Design
The implementation has 5 steps:
Steps 1 & 2 must be done separately on separate libraries, in order to easily insert read groups information requires by BQSR. The separate bam files must be merged before marking duplicates.
Implementation
Because of the multiple operations required to produce the final result, I opted to create a meta sub-step, which creates a Snakefile which handles all necessary steps. This is similar to the parallel wrapper, except that the steps are not chunks of the same operation on smaller regions, but logically different operations.
Benefits
work/<tool>.<library>
with large files which are not final results.Drawbacks
bwa
orbwa-mem2
(as they share the same input parameters). It is also currently not possible to opt out of BQSR.snappy
design.AGeNT
is a commercial software from Agilent. I don't think it is available on Bioconda. Because of time pressure, I don't have the time to look for alternatives (such asumitools
).Notes
The current implementation must be viewed as a prototype. If the meta sub-step concept is deemed acceptable for
snappy
, I have considered a few options to improve on the current implementation and make it more flexible.AGeNT
requires the-C
option (append FASTA/FASTQ comment to SAM output
) to be set, add read groups, run on separate libraries).However, in my opinion, all changes and improvements must be weighted against an undue complexification of the
ngs_mapping
step.