extend TrimPrimers to work amplicons with only a single primer #679

dlaehnemann · 2021-06-16T14:05:12Z

We basically have QIAseq amplicons that are generated with only one gene-specific primer, see this schematic, which is page 12 from this guide:
https://www.qiagen.com/us/resources/download.aspx?id=8907edbe-a462-4883-ae1b-2759657e7fd0&lang=en

Thus, we are trying to generalize Amplicon, so that the _start and _end on the left or right end of the amplicon are always Options and at one end can be None. In the input primer file, the respective columns would just be left empty to generate the None values.

We'll also include a testcase.

nh13 · 2021-06-16T20:38:29Z

@dlaehnemann exciting! Let me take a look and give feedback.

dlaehnemann · 2021-06-16T20:56:31Z

Feel free to have a look, but this is still very early days and the PR is mostly to have a good online view of the diffs so far.

The test cases should be coming any time soon and we have a working IDE setup for Scala that helps a lot, so we have an idea of what we probably have to change to make our use case work.

But if you e.g. have some good pointers for an intro into the basics of Scala syntax, that might be good to have.

nh13 · 2021-06-16T21:02:29Z

@dlaehnemann I'd recommend reading through some of our code (utility classes or tools) for our style (everyone has opinions!). For a more comprehensive study, I'd read Scala for the Impatient.

nh13 · 2021-06-16T23:02:54Z

@dlaehnemann can you describe the desired result of primer trimming for R1 and R2 respectively? Do we only trim primers from the 5' end of R1 (or only R2), or something else? From briefly looking at the schematic, it isn't immediately obvious what the R1 and R2 read structures look like after sequencing.

dlaehnemann · 2021-06-17T09:12:32Z

We have a primer file with tab-separated entries like these:

chr1	156851455	1	AGGCCCCAGTATTCCGGCTAACCACT
chr7	55259380	0	GATGCAGAGCTTCTTCCCATGATGATCTG

Chromosome and position don't really matter (they should be hg19 in this case, and also note that these are not the original primers, as someone might hold rights to those...), because we realign the primers to whatever genome is used in the analysis pipeline.

Column 3 indicates whether the primer is on the forward or the reverse strand. 0 means that the primer and the read will be on the forward strand (i.e. identical to the reference sequence). Thus, from the perspective of the reference sequence, the respective read should be the left-most and will have the primer sequence first, followed on the right by the read sequence. 1 means the primer sequence depicted here is the reverse complement of the reference sequence and the primer sequence will appear at the right end of a reverse complement read (i.e. the start of that read).

In effect, the primer should always be trimmed from the 5' end (start) of the read that is the first in pair (picard "Explain SAM Flags) , or as the SAM spec puts it: the first segment in the template.

This means, that we need an Amplicon definition, where only one primer is given and thus the other end of the amplicon is undefined / ragged. We think this can be achieved by wrapping all the primer position definitions for an amplicon in Option[]s and then matching all valid cases and throwing an exception otherwise.

One caveat we have just identified: when the total amplicon length is smaller than the read length, there will also be primer sequence at the (3') end of the last segment in the template (second in pair). But I guess simply also cutting the same primer region from the second in pair that we already cut at the start of the first in pair (if it appears in the second in pair) should do the trick for that case and not do anything in the case of longer amplicons.

OK, I hope I didn't confuse any flags or read directions or strands in this...

nh13 · 2021-06-23T17:53:31Z

@dlaehnemann @FelixMoelder do you want to test out #681 to if that works for you?

FelixMoelder · 2021-06-24T08:59:21Z

@nh13 Thank's for that alternative implementation. We are still setting up some test cases but as soon as those are done we will also try if this works on your PR.

dlaehnemann · 2021-06-24T10:22:29Z

BTW, we also found a good graphic about the template / fragment setup with this type of targeted panel, that is less confusing than Qiagen's own graphic...

CGATOxford/UMI-tools#175 (comment)

nh13 · 2021-09-06T14:57:11Z

Closing in favor of #681

dlaehnemann and others added 5 commits June 16, 2021 13:01

turn all primer positions in Amplicon into Option s

9d73b1e

Minor changes Amplicon.scala

05a33c9

Minor changes Amplicon.scala

5b06566

Minor changes Amplicon.scala

e904d82

Handle options

1372b5c

nh13 self-requested a review June 16, 2021 20:38

nh13 self-assigned this Jun 16, 2021

FelixMoelder added 2 commits June 17, 2021 09:38

Set exceptions

a63e892

Modify existing tests

7bee596

nh13 mentioned this pull request Jun 18, 2021

[feature] TrimPrimers can trim only R1s #681

Merged

FelixMoelder added 4 commits June 29, 2021 13:06

Fix right end primer

48debdf

Reverted tests

269ff36

Reverted tests

3a75ccd

Cleanup

c24532e

nh13 closed this Sep 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend TrimPrimers to work amplicons with only a single primer #679

extend TrimPrimers to work amplicons with only a single primer #679

dlaehnemann commented Jun 16, 2021

nh13 commented Jun 16, 2021

dlaehnemann commented Jun 16, 2021

nh13 commented Jun 16, 2021

nh13 commented Jun 16, 2021

dlaehnemann commented Jun 17, 2021

nh13 commented Jun 23, 2021

FelixMoelder commented Jun 24, 2021

dlaehnemann commented Jun 24, 2021

nh13 commented Sep 6, 2021

extend TrimPrimers to work amplicons with only a single primer #679

extend TrimPrimers to work amplicons with only a single primer #679

Conversation

dlaehnemann commented Jun 16, 2021

nh13 commented Jun 16, 2021

dlaehnemann commented Jun 16, 2021

nh13 commented Jun 16, 2021

nh13 commented Jun 16, 2021

dlaehnemann commented Jun 17, 2021

nh13 commented Jun 23, 2021

FelixMoelder commented Jun 24, 2021

dlaehnemann commented Jun 24, 2021

nh13 commented Sep 6, 2021