feat: Initial implementation of Molecular bar codes handling using AGeNT #462

ericblanc20 · 2023-11-01T09:07:47Z

Prototype implementation of mapping data generated with Molecular BarCodes (MBC) or UMIs.

Background

The MBCs are typically used on FFPE data, where library complexity may be low. These data are also often compromised by FFPE or oxo-G artifacts which require careful analysis & filtration of somatic variants. Base-quality re-calibration (BQSR) should be used in these cases.

Design

The implementation has 5 steps:

Trimming the MBC sequence at the read's 5' end and inserting it in the read name for further processing.
Mapping the reads
Merging separate libraries
Marking the duplicates using the information in read names
Performing BQSR

Steps 1 & 2 must be done separately on separate libraries, in order to easily insert read groups information requires by BQSR. The separate bam files must be merged before marking duplicates.

Implementation

Because of the multiple operations required to produce the final result, I opted to create a meta sub-step, which creates a Snakefile which handles all necessary steps. This is similar to the parallel wrapper, except that the steps are not chunks of the same operation on smaller regions, but logically different operations.

Benefits

The wrapper creating and running the Snakefile creates a temporary directory where all the disk-intensive operations occur. This avoid cluttering the work/<tool>.<library> with large files which are not final results.
The code is relatively straightforward, and blends well the the other features from the step (coverage analysis, ...)
There is natural parallelisation of libraries sequenced on multiple lanes.

Drawbacks

The code (as it is now) is inflexible: it is hard to see how another MBC tool could be added, and the mapper must be either bwa or bwa-mem2 (as they share the same input parameters). It is also currently not possible to opt out of BQSR.
The notion of meta sub-step may go against the whole snappy design.
The MBC tool AGeNT is a commercial software from Agilent. I don't think it is available on Bioconda. Because of time pressure, I don't have the time to look for alternatives (such as umitools).

Notes

The current implementation must be viewed as a prototype. If the meta sub-step concept is deemed acceptable for snappy, I have considered a few options to improve on the current implementation and make it more flexible.

The BQSR can easily be put under user control.
The mapper wrappers could be modified to adapt their parameters to enable compatibility with the MBC tool (AGeNT requires the -C option (append FASTA/FASTQ comment to SAM output) to be set, add read groups, run on separate libraries).
The current wrapper could be abstract, with mbc-tool sprecific concrete classes.

However, in my opinion, all changes and improvements must be weighted against an undue complexification of the ngs_mapping step.

coveralls · 2023-11-01T09:13:32Z

coverage: 85.736% (-0.1%) from 85.866%
when pulling 11d5464 on 461-support-for-molecular-barcodes
into 4874074 on main.

mbenary

One quick question, otherwise looks good to me.

mbenary · 2023-11-02T16:09:36Z

snappy_wrappers/wrappers/mbcs/wrapper.py

+# Input fastqs are passed through snakemake.params.
+# snakemake.input is a .done file touched after linking files in.
+input_left = snakemake.params.args["input"]["reads_left"]
+input_right = snakemake.params.args["input"].get("reads_right", "")


Why are you using get for one input and [ ] for the other?

Because the first is always present, and should raise an exception if not, while the second is optional, and with the empty list as default.

…on for the whole somatic pipeline)

TODO: 1. rename the ugly 'mcbs' to 'somatic' or 'accurate' 2. implement the 'extra_args' in the mapping tools to enable mapper control parameters specific for barcodes (-C) 3. make the mapping operation generic (not restricted to bwa & bwa-mem2) 4. implement umi_tools for barcodes/umis processing 5. rename bqsr statistics so they can be collected by multiqc

feat: Initial implmentation of Molecular bar codes handling using AGeNT

a3f171d

ericblanc20 requested review from holtgrewe and mbenary November 1, 2023 09:07

ericblanc20 linked an issue Nov 1, 2023 that may be closed by this pull request

Support for molecular barcodes #461

Closed

mbenary approved these changes Nov 2, 2023

View reviewed changes

ericblanc20 added 4 commits November 10, 2023 16:53

refactor: changed resource allocation (TODO: revist resource allocati…

2c37800

…on for the whole somatic pipeline)

fix: DNAcopy (PureCN) chromosome naming & column names

e868d06

fix: allow compressed or uncompressed bed files & typo

2eebef2

ericblanc20 merged commit 768dded into main Dec 20, 2023
7 checks passed

ericblanc20 deleted the 461-support-for-molecular-barcodes branch December 20, 2023 14:21

tedil mentioned this pull request Jun 28, 2024

chore(main): release 0.1.0 #520

Merged

This was referenced Dec 9, 2024

chore(main): release 1.0.0 #571

Closed

chore(main): release 1.0.0 #572

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Initial implementation of Molecular bar codes handling using AGeNT #462

feat: Initial implementation of Molecular bar codes handling using AGeNT #462

ericblanc20 commented Nov 1, 2023

coveralls commented Nov 1, 2023 •

edited

Loading

mbenary left a comment

mbenary Nov 2, 2023

ericblanc20 Nov 2, 2023

feat: Initial implementation of Molecular bar codes handling using AGeNT #462

feat: Initial implementation of Molecular bar codes handling using AGeNT #462

Conversation

ericblanc20 commented Nov 1, 2023

Prototype implementation of mapping data generated with Molecular BarCodes (MBC) or UMIs.

Background

Design

Implementation

Notes

coveralls commented Nov 1, 2023 • edited Loading

mbenary left a comment

Choose a reason for hiding this comment

mbenary Nov 2, 2023

Choose a reason for hiding this comment

ericblanc20 Nov 2, 2023

Choose a reason for hiding this comment

coveralls commented Nov 1, 2023 •

edited

Loading