-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
module: deepvariant #572
module: deepvariant #572
Conversation
7ef8c8b
to
0a6bd43
Compare
0a6bd43
to
940f40c
Compare
modules/deepvariant/main.nf
Outdated
// container "quay.io/biocontainers/deepvariant:1.1.0--py36hf3e76ba_2" | ||
container "google/deepvariant:1.1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with this but is there a reason why you commented out the container hosted by quay.io? If you use the statement google/deepvariant:1.1.0
should you include the host? docker.io/google/deepvariant:1.1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that the quay version doesn't contain the new /opt/deepvariant/bin/run_deepvariant
command which combines all other scripts.
Therefore, I was planning to get started with the official Google container and then transition to the quay
one - what do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update:
Internally, the exact commands used via the run_depvariant
wrapper seem to be the following (for the command mentioned here)
time seq 0 1 | parallel -q --halt 2 --line-buffer /opt/deepvariant/bin/make_examples --mode calling --ref "/input/ucsc.hg19.chr20.unittest.fasta" --reads "/input/NA12878_S1.chr20.10_10p1mb.bam" --examples "/output/intermediate_results_dir/make_examples.tfrecord@2.gz" --gvcf "/output/intermediate_results_dir/gvcf.tfrecord@2.gz" --regions "chr20:10,000,000-10,010,000" --task {}
time /opt/deepvariant/bin/call_variants --outfile "/output/intermediate_results_dir/call_variants_output.tfrecord.gz" --examples "/output/intermediate_results_dir/make_examples.tfrecord@2.gz" --checkpoint "/opt/models/wgs/model.ckpt" --openvino_model_dir "/output/intermediate_results_dir"
time /opt/deepvariant/bin/postprocess_variants --ref "/input/ucsc.hg19.chr20.unittest.fasta" --infile "/output/intermediate_results_dir/call_variants_output.tfrecord.gz" --outfile "/output/output.vcf.gz" --nonvariant_site_tfrecord_path "/output/intermediate_results_dir/gvcf.tfrecord@2.gz" --gvcf_outfile "/output/output.g.vcf.gz"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further updates:
I tried to completely rely upon the native make_examples.py
command however, that doesn't seem to work with the bioconda
based distribution.
dv_make_examples.py \
--ref ucsc.hg19.chr20.unittest.fasta \
--sample NA12878_S1.chr20.10_10p1mb \
--reads NA12878_S1.chr20.10_10p1mb.bam \
--gvcf test.g.vcf.gz \
--regions "chr20:10,000,000-10,010,000" \
--logdir "intermediate_results_dir/logs" \
--examples "intermediate_results_dir/make_examples.tfrecord@2.gz"
Output:
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:
Tange, O. (2021, June 22). GNU Parallel 20210622 ('Protasevich').
Zenodo. https://doi.org/10.5281/zenodo.5013933
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice
To silence this citation notice: run 'parallel --citation' once.
sh: /usr/local/lib/libtinfo.so.6: no version information available (required by sh)
Computers / CPU cores / Max jobs to run
1:local / 4 / 1
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 1 AVG: 0.00s local:1/0/100%/0.0s sh: /usr/local/lib/libtinfo.so.6: no version information available (required by sh)
ETA: 0s Left: 1 AVG: 0.00s local:1/0/100%/0.0s /bin/bash: /usr/local/lib/libtinfo.so.6: no version information available (required by /bin/bash)
sh: /usr/local/lib/libtinfo.so.6: no version information available (required by sh)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/share/deepvariant-1.2.0-0/binaries/DeepVariant/1.2.0/DeepVariant-1.2.0/make_examples.zip/__main__.py", line 375, in <module>
File "/usr/local/share/deepvariant-1.2.0-0/binaries/DeepVariant/1.2.0/DeepVariant-1.2.0/make_examples.zip/__main__.py", line 348, in Main
File "/usr/local/lib/python3.6/subprocess.py", line 287, in call
with Popen(*popenargs, **kwargs) as p:
File "/usr/local/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/local/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/python3': '/usr/bin/python3'
parallel: This job failed:
/usr/local/bin/python /usr/local/share/deepvariant-1.2.0-0/binaries/DeepVariant/1.2.0/DeepVariant-1.2.0/make_examples.zip --mode calling --ref ucsc.hg19.chr20.unittest.fasta --reads NA12878_S1.chr20.10_10p1mb.bam --regions chr20:10,000,000-10,010,000 --gvcf test.g.vcf.gz/NA12878_S1.chr20.10_10p1mb.gvcf.tfrecord@1.gz --sample_name NA12878_S1.chr20.10_10p1mb --examples intermediate_results_dir/make_examples.tfrecord@2.gz/NA12878_S1.chr20.10_10p1mb.tfrecord@1.gz --task 0
For the time being, I'll continue with the Google docker
based module dev.
@projectoriented , as of 99de3a0 the docker based test is passing with the I'd be happy to receive further feedback or suggestions to tackle #572 (comment) I think that to include the |
Update: The For consuming the |
Hello 👋 ! Apologies for the delayed response, I was away on holiday. I've tested the module locally and can attest to the difficulties with the |
You're welcome @projectoriented ! Do you know whether it'd be possible to add a module which isn't relying upon bioconda (/biocontainers) infrastructure? Otherwise, this PR is kinda blocked till the situation with |
@abhi18av As far as I've seen, most likely not due to |
Crossing the rubicon then 🤞 |
Hi @drpatelh , we're in a corner regarding the conda package. Would really appreciate any guidance you can share on moving this PR forward 🙏 |
Unfortunately no progress on my side so far, though within the scope of this PR, I'm inclined to go what Gregor suggested here #572 (comment) - an approach similar to the That would be acceptable right? |
Sounds good to me. In case it will be made available on conda, the module can always be updated |
I am afraid it will still need a bit of updating to the new syntax. Depending on your time I can also help out, if you want |
Thanks @FriederikeHanssen , yes I'll probably pick it up this Saturday. But in case this is urgent, please feel free to add the magic touch 🪄 😊 |
Hi @FriederikeHanssen , I've updated the syntax of the module now and adapted the new arg mechanism. At this point, we circle back to the problem of data (I think), which was point out by @maxulysse in #572 (comment) I've tried out with Probably due to the range specified? 🤔 I checked the test-datasets description, but couldn't make out the region I should rely upon.
|
Can you try with this region: https://github.com/nf-core/test-datasets/blob/modules/data/genomics/homo_sapiens/genome/genome.bed |
Bingo - thanks Friedrike! Docker ✅ Shall we go ahead with the merge? 🤩 |
Conda fails, seems fine to me :D Yes update and merge I'd say. Thanks a bunch 😊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The concerns we addressed in latest changes.
Thanks - I don't see the Merge option here, perhaps a permission issue? In any case, just to recap for anyone else who's ready to review the PR, the conda issue was addressed as suggested here #572 (comment) |
I think maybe you were out of sync with master branch. I have merged it now and can squash and merge once the tests are done |
Aaaaand - we're in! 🙏 Now, we can replace the use of this module in sarek https://github.com/nf-core/sarek/blob/dev/modules/local/deepvariant/main.nf |
PR checklist
Closes #234
<SOFTWARE>.version.txt
file.label
PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd
PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd
PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd