-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add dorado #256
add dorado #256
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of 'unusual' things to what I normally see in nf-core pipelines, but I don't see anything necessarily breaking, except for the licensing issue.
Main thing though you have quite a few modules that appear to only work with conda (not docker), but there is no documentation on this, I think it would be very important to add lots of warnings/checks in the code and also usage documentation warning users that conda
won't be possible in many cases
trim_barcodes=true | ||
output_demultiplex_fast5 = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tag "$meta.id" | ||
label 'process_medium' | ||
|
||
container "docker.io/ontresearch/dorado" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double check, it is OK to use this license wise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And would this work with singularity stilll?
dorado download --model $dorado_model | ||
dorado basecaller $dorado_model $pod5_path --device $dorado_device --emit-fastq > basecall.fastq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any options a user could theoretically add? Missing ext.args
, for example.
dorado: \$(echo \$(dorado --version 2>&1) | sed -r 's/.{81}//') | ||
END_VERSIONS | ||
|
||
gzip basecall.fastq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably go before the emissions, and should the file be forced to be basecall.fastq
for downstream purposes? Otherwise Iw ould recommend using the ${prefix}.fastq
system
label 'process_medium' | ||
|
||
conda "conda-forge::r-base=4.0.3 bioconda::bioconductor-bambu=3.0.8 bioconda::bioconductor-bsgenome=1.66.0" | ||
container "docker.io/yuukiiwa/pod5:0.2.4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be a biocontainer
if (workflow.profile.contains('test')){ | ||
ch_input_path = params.input_path | ||
} else { | ||
ch_input_path = Channel.fromPath(params.input_path, checkIfExists: true) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this, there is no difference in the way the channel gets taken right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are different.
For the test, I need to get the stage fast5 directory (with many fast5 files) from nf-core/test-dataset, so the input_path is not local, so checkInExist doesn't work
For the user input, there's no staging of the fast5 directory, so checking whether those exist is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, misread input_path
as just input
😅 . I find the testdata set up unusual which is also why tripped me up, but I don't think this is relevant for this PR (also given you've been waiting such a long time)
"dorado_device": { | ||
"type": "string", | ||
"default": "cuda:all", | ||
"description": "Device specified using '--device'.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is dorado a particular model of nanopore or something? What is a dorado device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for specifying what kind of compute one wants to use: cuda:all for all GPUs or cuda:0 for a specific GPU or CPU
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
Add prefix parameter in nanoplot
Important! Template update for nf-core/tools v2.14.1
replaced by #277 |
Currently,
dorado v0.3.2
is incorporated into through its docker container. It works without demultiplexing, but doesn't work with demultiplexing withqcat
downstreamI raised an issue on
dorado
's github repo requesting for the dorado v0.4.0 to be dockerized (here is the issue). Will not incorporate basecalling and demultiplexing until the dorado v0.4.0 is available on docker hub.I made some changes to basecalling without demultiplexing, where the user can specify the input fast5 directory from the samplesheet for each sample. If a user has multiple sample, then he/she will have to indicate the respective input fast5 directory for those samples.
Here is the run on my machine: