Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question on interleaved reads mapping #63

Open
jianshu93 opened this issue Jul 29, 2024 · 8 comments
Open

question on interleaved reads mapping #63

jianshu93 opened this issue Jul 29, 2024 · 8 comments

Comments

@jianshu93
Copy link

hi @jguhlin,

In this wrapper, is there a way to use interleaved reads (or forward & reverse reads) as input since I only saw single fasta as input.

Thanks,
Jianshu

@esteinig
Copy link

@jianshu93 I'm not sure what the most parsimonious solution is, but in a synchronous context you could use crossbeam::queue with an Arc<Mutex<Sender>> and then spawn a couple of threads with rayon that read the forward and reverse files using needletail. Each would send the record or sequence into the queue, whose receiver can be concurrently iterated over for mapping in the main function.

@jguhlin would love to hear your thoughts on this - and thanks for creating such excellent bindings, minimap2-rs has been so useful already :)

@jguhlin
Copy link
Owner

jguhlin commented Aug 2, 2024

Hey! Apologies for the silence; grant writing time, trying to justify my continued existence...

@jianshu93 @esteinig

For multithreading, I like how I've done it here, using crossbeam queues:
https://github.com/jguhlin/minimap2-rs/blob/main/fakeminimap2/src/main.rs

Reading the file is left up to you, via Needletail or other libraries, and I try to keep this library agnostic to file parsing. I'm developing a different file format https://github.com/jguhlin/sfasta/tree/tokio so need it to be agnostic, but I believe some others were loading sequences direct from databases as well.

As for paired reads, minimap2 is really meant for single long-reads, but the python implementation does have some support, as long as both are in the FR orientation. I could possibly port that over if you need it?

@esteinig Thanks for the kind words! Let me know what you are using it for (if public) and I'll add it to this page. Definitely need the ego boost this week, so it's much appreciated!

https://github.com/lh3/minimap2/tree/master/python

(Search for: This method aligns seq against the index...." paragraph.

@esteinig
Copy link

esteinig commented Aug 5, 2024

@jguhlin

I really like your crossbeam implementation - tried something similar with some channels and rayon but haven't quite managed to replicate it. Learning a bunch of things at least :)

Understand the grant writing insanity, sorry to hear you are in the midst of it - and do wish the reviewers would appreciate efforts like this crate a lot more! If there's anything that helps with exposure let me know. Perhaps a Zenodo link for minimap2-rs to collect citations may help a tiny bit?

I'm currently using it in a host depletion tool scrubby (https://github.com/esteinig/scrubby) on the dev branch for release of 1.0.0. There is another couple of project that use it but not public yet - will ping you when they are!

Re paired reads: I know people use minimap2 for short reads quite a bit, although it's not meant to be really. It doesn't seem to matter whether I plug in R1 and R2 sequentially with the sr preset in benchmarks for human read identification vs minimap2. I'd say it'd be a "nice to have" feature but also would not want to add more work to your (probably extensive) list of things to implement :)

@jguhlin
Copy link
Owner

jguhlin commented Aug 5, 2024

@esteinig Thanks. Crossbeam is my go to for multithreading, I came from Clojure so channels/queues are kind of what I was 'raised' on. I'm trying the flume crate, which is supposed to be faster, but have realized that the channels are not my bottleneck. https://github.com/jguhlin/sfasta/blob/3efd730d3ba22cd8ab21fc8306695ce096c82818/compression/src/lib.rs#L456 (and many other places)

Thanks for the well wishes. :) I've thought about a doi, but as Hengi Li (@lh3) has done all the work and I've just added some glue and used it as an excuse to learn FFI, I'm not totally on board yet. But I do list the project and the number of other projects using it on my CV.

I'll get scrubby added to this repo's readme, if that's alright with you? As for the others, no rush, whenever they are ready. It does help me keep motivation up to maintain though!

Regarding the paired reads, if it is being used I'll get it added. Let's consider it on my todo list.

Cheers

@wdecoster
Copy link
Contributor

I don't think it is true that you "just added some glue". This must have been a lot of work, and so is maintaining it. It is also clear that developers from companies have started using your crate. You are having an impact. A DOI, or even a minimal publication, wouldn't be inappropriate. Unfortunately, it is the 'academic currency'.

@esteinig
Copy link

esteinig commented Aug 5, 2024

Absolutely agree with the effort / maintenance argument, this is a lot of work and people are finding it really useful by the looks of it.

Feel free to add scrubby to the list of course @jguhlin, I'll merge the feature in the next few days. So nice to be able to ship the long read version as a single binary :)

@jguhlin
Copy link
Owner

jguhlin commented Aug 5, 2024

@wdecoster Thanks! I really appreciate it. I'll look into getting the DOI setup once I get a little time.

@lh3
Copy link

lh3 commented Aug 5, 2024

@jguhlin Thank you so much for your effort. Let me know if you need a letter of support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants