-
-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extend TrimPrimers to work amplicons with only a single primer #679
extend TrimPrimers to work amplicons with only a single primer #679
Conversation
@dlaehnemann exciting! Let me take a look and give feedback. |
Feel free to have a look, but this is still very early days and the PR is mostly to have a good online view of the diffs so far. The test cases should be coming any time soon and we have a working IDE setup for Scala that helps a lot, so we have an idea of what we probably have to change to make our use case work. But if you e.g. have some good pointers for an intro into the basics of Scala syntax, that might be good to have. |
@dlaehnemann I'd recommend reading through some of our code (utility classes or tools) for our style (everyone has opinions!). For a more comprehensive study, I'd read Scala for the Impatient. |
@dlaehnemann can you describe the desired result of primer trimming for R1 and R2 respectively? Do we only trim primers from the 5' end of R1 (or only R2), or something else? From briefly looking at the schematic, it isn't immediately obvious what the R1 and R2 read structures look like after sequencing. |
We have a primer file with tab-separated entries like these:
Chromosome and position don't really matter (they should be hg19 in this case, and also note that these are not the original primers, as someone might hold rights to those...), because we realign the primers to whatever genome is used in the analysis pipeline. Column 3 indicates whether the primer is on the forward or the reverse strand. In effect, the primer should always be trimmed from the 5' end (start) of the read that is the This means, that we need an Amplicon definition, where only one primer is given and thus the other end of the amplicon is undefined / ragged. We think this can be achieved by wrapping all the primer position definitions for an amplicon in One caveat we have just identified: when the total amplicon length is smaller than the read length, there will also be primer sequence at the (3') end of OK, I hope I didn't confuse any flags or read directions or strands in this... |
@dlaehnemann @FelixMoelder do you want to test out #681 to if that works for you? |
@nh13 Thank's for that alternative implementation. We are still setting up some test cases but as soon as those are done we will also try if this works on your PR. |
BTW, we also found a good graphic about the template / fragment setup with this type of targeted panel, that is less confusing than Qiagen's own graphic... |
Closing in favor of #681 |
We basically have QIAseq amplicons that are generated with only one gene-specific primer, see this schematic, which is page 12 from this guide:
https://www.qiagen.com/us/resources/download.aspx?id=8907edbe-a462-4883-ae1b-2759657e7fd0&lang=en
Thus, we are trying to generalize
Amplicon
, so that the_start
and_end
on the left or right end of the amplicon are alwaysOption
s and at one end can beNone
. In the input primer file, the respective columns would just be left empty to generate theNone
values.We'll also include a testcase.