You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a rather large set of Illumina data along with some nanopore reads on which I was trying to run the hybrid assembly option. After 10+ hours, filtlong was still processing the nanopore reads. I did some digging and the current command utilizes the short-read data as part of the reference option. I think that is fine for small-ish datasets but seems impractical for larger ones.
Once I edited the filtlong.nf code to no longer use the short-reads, the filtlong process took less than 5 minutes and the pipeline has proceeded as expected. Maybe there could be a flag to turn on/off that feature?
If you want to contribute to the module and pipeline, you can make these changes via a PR to nf-core/moduels, and we can update in the pipeline (With credit to you!) - the contributions will be gratefully recieved :)
Note that @muabnezor is currently in the process of overhauling the long-read/nanopore preprocessing tools anyway, we just merged into the dev branch porechop_abi as a faster replacment for porechop and next we plan to add nanoq as an alternative to Filtlong. So if you prefer that, you could wait for that instead
That said, I think updating filtlong would still be very helpful to the community as a whole. Let me know what you think!
jfy133
changed the title
Filtlong takes forever with large hybrid datasets
Optional skipping of short-read input to Filtlong for large datasets
Oct 14, 2024
Description of the bug
Hi all - long time listener first time caller:
I have a rather large set of Illumina data along with some nanopore reads on which I was trying to run the hybrid assembly option. After 10+ hours,
filtlong
was still processing the nanopore reads. I did some digging and the current command utilizes the short-read data as part of the reference option. I think that is fine for small-ish datasets but seems impractical for larger ones.Once I edited the
filtlong.nf
code to no longer use the short-reads, the filtlong process took less than 5 minutes and the pipeline has proceeded as expected. Maybe there could be a flag to turn on/off that feature?filtlong.nf
:Edited working solution:
Command used and terminal output
No response
Relevant files
No response
System information
No response
The text was updated successfully, but these errors were encountered: