Does this support demultiplexing with forward and reverse barcode sequence pairs? #1

dnk8n · 2018-11-27T04:20:20Z

I have a fastq file with sequences that may or may not be displayed in reverse compliment form.

I expect some fastq records to match one of many samples, each with a forward barcode AND reverse barcode. If no match I would need to reverse compliment the record's sequence and try the barcode pair again.

If this is not yet supported, I would like to implement it in a format that suits you if this feature is something you feel might be worthwhile.

jenzopr · 2018-11-27T07:14:14Z

Hi Dean,
thanks for your suggestion. Very welcome!
I guess its possible! Barcodes sequences (whether they're forward or reverse complement doesn't matter in the first place) are matched via a regex and then looked up in a mutationhash. One could easily implement the optional inclusion of reverse complement of each barcode in the mutationhash to enable the two-way search - even without implementing an explicit "second search".
I'd be happy to receive a PR from you - or you give me a couple of days to implement it.
Best,
Jens

dnk8n · 2018-11-27T07:28:27Z

I might have misunderstood how your tool works. Does it look for the barcode in the record header rather than the record sequence itself? Our header information is lacking the barcodes, but they are present within the sequence itself (with potential for error, in which things like edit distance, etc should be evaluated).

Perhaps your tool is solving a slightly different set of demultiplexing problems than what I had in mind...

I would be happy to contribute a PR but my primary focus is finalizing a processing pipeline, so first prize is to use a tool which already has the feature I am after. Second prize would be to submit this feature upstream to the tool with the lowest barrier to entry.

I want to avoid re-implementing something new if I can. But if it is the fastest way, then I may have to do that for now. Will let you know if I choose to commit to submitting a PR.

jenzopr · 2018-11-27T08:11:34Z

I get your point. Typically (e.g. in scRNA-seq), you'll have a paired-end sequencing. One read in the pair (e.g. R1) will contain the barcode sequences (as records, not headers), the other read in the pair (e.g. R2) will contain the actual RNA sequence of interest. To avoid reading two files simultaneously, we re-write the header of R2 to contain the sequence record from R1 (this happens directly in bcl2fastq from Illumina). pydemult then takes R2 as input.

dnk8n · 2018-11-27T10:46:17Z

I am new to the world of bioinformatics, so forgive my misinterpretation. Didn't realise how many different ways there were of doing things!

I am working with PacBio data.

I will review this convo once the fog has lifted and my other tasks are done.

jenzopr · 2018-11-27T11:28:08Z

Oh, welcome then 😃
No worries - just come back whenever you've time to.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does this support demultiplexing with forward and reverse barcode sequence pairs? #1

Does this support demultiplexing with forward and reverse barcode sequence pairs? #1

dnk8n commented Nov 27, 2018

jenzopr commented Nov 27, 2018

dnk8n commented Nov 27, 2018

jenzopr commented Nov 27, 2018 •

edited

Loading

dnk8n commented Nov 27, 2018

jenzopr commented Nov 27, 2018

Does this support demultiplexing with forward and reverse barcode sequence pairs? #1

Does this support demultiplexing with forward and reverse barcode sequence pairs? #1

Comments

dnk8n commented Nov 27, 2018

jenzopr commented Nov 27, 2018

dnk8n commented Nov 27, 2018

jenzopr commented Nov 27, 2018 • edited Loading

dnk8n commented Nov 27, 2018

jenzopr commented Nov 27, 2018

jenzopr commented Nov 27, 2018 •

edited

Loading