Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: deepvariant #572

Merged
merged 23 commits into from
Jan 17, 2022
Merged

module: deepvariant #572

merged 23 commits into from
Jan 17, 2022

Conversation

abhi18av
Copy link
Member

@abhi18av abhi18av commented Jul 13, 2021

PR checklist

Closes #234

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the <SOFTWARE>.version.txt file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd
    • PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd
    • PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd

@abhi18av abhi18av force-pushed the abhinav/deepvariant branch 3 times, most recently from 7ef8c8b to 0a6bd43 Compare July 13, 2021 19:38
@abhi18av abhi18av self-assigned this Jul 13, 2021
@abhi18av abhi18av added the new module Adding a new module label Jul 13, 2021
@abhi18av abhi18av mentioned this pull request Aug 23, 2021
Comment on lines 25 to 26
// container "quay.io/biocontainers/deepvariant:1.1.0--py36hf3e76ba_2"
container "google/deepvariant:1.1.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this but is there a reason why you commented out the container hosted by quay.io? If you use the statement google/deepvariant:1.1.0 should you include the host? docker.io/google/deepvariant:1.1.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that the quay version doesn't contain the new /opt/deepvariant/bin/run_deepvariant command which combines all other scripts.

Therefore, I was planning to get started with the official Google container and then transition to the quay one - what do you suggest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update:

Internally, the exact commands used via the run_depvariant wrapper seem to be the following (for the command mentioned here)

time seq 0 1 | parallel -q --halt 2 --line-buffer /opt/deepvariant/bin/make_examples --mode calling --ref "/input/ucsc.hg19.chr20.unittest.fasta" --reads "/input/NA12878_S1.chr20.10_10p1mb.bam" --examples "/output/intermediate_results_dir/make_examples.tfrecord@2.gz" --gvcf "/output/intermediate_results_dir/gvcf.tfrecord@2.gz" --regions "chr20:10,000,000-10,010,000" --task {}

time /opt/deepvariant/bin/call_variants --outfile "/output/intermediate_results_dir/call_variants_output.tfrecord.gz" --examples "/output/intermediate_results_dir/make_examples.tfrecord@2.gz" --checkpoint "/opt/models/wgs/model.ckpt" --openvino_model_dir "/output/intermediate_results_dir"

time /opt/deepvariant/bin/postprocess_variants --ref "/input/ucsc.hg19.chr20.unittest.fasta" --infile "/output/intermediate_results_dir/call_variants_output.tfrecord.gz" --outfile "/output/output.vcf.gz" --nonvariant_site_tfrecord_path "/output/intermediate_results_dir/gvcf.tfrecord@2.gz" --gvcf_outfile "/output/output.g.vcf.gz"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further updates:

I tried to completely rely upon the native make_examples.py command however, that doesn't seem to work with the bioconda based distribution.

dv_make_examples.py \
    --ref ucsc.hg19.chr20.unittest.fasta \
    --sample NA12878_S1.chr20.10_10p1mb \
    --reads NA12878_S1.chr20.10_10p1mb.bam \
    --gvcf test.g.vcf.gz \
    --regions "chr20:10,000,000-10,010,000" \
    --logdir "intermediate_results_dir/logs" \
    --examples "intermediate_results_dir/make_examples.tfrecord@2.gz" 

Output:

Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:

  Tange, O. (2021, June 22). GNU Parallel 20210622 ('Protasevich').
  Zenodo. https://doi.org/10.5281/zenodo.5013933

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice

To silence this citation notice: run 'parallel --citation' once.

sh: /usr/local/lib/libtinfo.so.6: no version information available (required by sh)

Computers / CPU cores / Max jobs to run
1:local / 4 / 1

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 1 AVG: 0.00s  local:1/0/100%/0.0s sh: /usr/local/lib/libtinfo.so.6: no version information available (required by sh)
ETA: 0s Left: 1 AVG: 0.00s  local:1/0/100%/0.0s /bin/bash: /usr/local/lib/libtinfo.so.6: no version information available (required by /bin/bash)
sh: /usr/local/lib/libtinfo.so.6: no version information available (required by sh)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/share/deepvariant-1.2.0-0/binaries/DeepVariant/1.2.0/DeepVariant-1.2.0/make_examples.zip/__main__.py", line 375, in <module>
  File "/usr/local/share/deepvariant-1.2.0-0/binaries/DeepVariant/1.2.0/DeepVariant-1.2.0/make_examples.zip/__main__.py", line 348, in Main
  File "/usr/local/lib/python3.6/subprocess.py", line 287, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/local/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/local/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/python3': '/usr/bin/python3'
parallel: This job failed:
/usr/local/bin/python /usr/local/share/deepvariant-1.2.0-0/binaries/DeepVariant/1.2.0/DeepVariant-1.2.0/make_examples.zip --mode calling --ref ucsc.hg19.chr20.unittest.fasta --reads NA12878_S1.chr20.10_10p1mb.bam --regions chr20:10,000,000-10,010,000 --gvcf test.g.vcf.gz/NA12878_S1.chr20.10_10p1mb.gvcf.tfrecord@1.gz --sample_name NA12878_S1.chr20.10_10p1mb --examples intermediate_results_dir/make_examples.tfrecord@2.gz/NA12878_S1.chr20.10_10p1mb.tfrecord@1.gz --task 0

For the time being, I'll continue with the Google docker based module dev.

@abhi18av
Copy link
Member Author

abhi18av commented Aug 25, 2021

@projectoriented , as of 99de3a0 the docker based test is passing with the google-deepvariant container ✅ 🐳

I'd be happy to receive further feedback or suggestions to tackle #572 (comment)

I think that to include the run_deepvariant wrapper into the quay.io container - we'd need to update the bioconda recipe. I'm not very experienced with the bioconda recipes 🤷

@abhi18av
Copy link
Member Author

abhi18av commented Aug 30, 2021

Update:

The google docker based tests are passing for this PR.

For consuming the bioconda containers from the upstream bioconda recipe, I have initiated a conversation and a PR in the bioconda-recipes repo, hoping that someone could help me out there. The deepvariant recipe seems to rely on heavily customized wrappers.

bioconda/bioconda-recipes#30310

@projectoriented
Copy link
Contributor

Hello 👋 ! Apologies for the delayed response, I was away on holiday. I've tested the module locally and can attest to the difficulties with the bioconda recipe so for now we'll just use a local copy where the singularity container directive points to the official hosted docker image. Thanks a bunch for putting in the effort!

@abhi18av
Copy link
Member Author

abhi18av commented Sep 6, 2021

You're welcome @projectoriented !

Do you know whether it'd be possible to add a module which isn't relying upon bioconda (/biocontainers) infrastructure?

Otherwise, this PR is kinda blocked till the situation with bioconda/deepvariant is resolved.

@projectoriented
Copy link
Contributor

@abhi18av As far as I've seen, most likely not due to Utils.groovy validation check for conda channels, and parameter definition for conda. I think as long as there's a possibility the user can define conda as the argument for profile parameter then it'll pose a problem 😛 🤷‍♀️ . I guess we won't know what directions to take until the core team reviews this

@abhi18av abhi18av marked this pull request as ready for review September 6, 2021 20:18
@abhi18av
Copy link
Member Author

abhi18av commented Sep 6, 2021

Crossing the rubicon then 🤞

@abhi18av abhi18av requested a review from a team September 6, 2021 20:18
@abhi18av
Copy link
Member Author

abhi18av commented Oct 1, 2021

Hi @drpatelh , we're in a corner regarding the conda package. Would really appreciate any guidance you can share on moving this PR forward 🙏

@drpatelh drpatelh added the help wanted Extra attention is needed label Oct 26, 2021
@FriederikeHanssen FriederikeHanssen added the WIP Work in progress label Nov 15, 2021
@FriederikeHanssen
Copy link
Contributor

@abhi18av @drpatelh were you able to figure out the conda packaging?

@abhi18av
Copy link
Member Author

Unfortunately no progress on my side so far, though within the scope of this PR, I'm inclined to go what Gregor suggested here #572 (comment) - an approach similar to the cellranger module.

That would be acceptable right?

@FriederikeHanssen
Copy link
Contributor

Sounds good to me. In case it will be made available on conda, the module can always be updated

@FriederikeHanssen
Copy link
Contributor

I am afraid it will still need a bit of updating to the new syntax. Depending on your time I can also help out, if you want

@abhi18av
Copy link
Member Author

Thanks @FriederikeHanssen , yes I'll probably pick it up this Saturday. But in case this is urgent, please feel free to add the magic touch 🪄 😊

@abhi18av
Copy link
Member Author

Hi @FriederikeHanssen , I've updated the syntax of the module now and adapted the new arg mechanism.

At this point, we circle back to the problem of data (I think), which was point out by @maxulysse in #572 (comment)

I've tried out with chr20/chr21 and chr22 in the regions with ext.args = ' --regions=\"chr20:10,000,000-10,010,000\" --model_type=WGS ' but the resulting error is the same.

Probably due to the range specified? 🤔 I checked the test-datasets description, but couldn't make out the region I should rely upon.

I0116 19:45:14.795090 140226250442560 genomics_reader.py:222] Reading test.paired_end.sorted.bam with NativeSamReader
I0116 19:45:14.797312 140226250442560 make_examples_core.py:236] Task 0/2: Common contigs are ['chr22']
Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_wp0ophpl/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 173, in <module>
    app.run(main)
  File "/tmp/Bazel.runfiles_wp0ophpl/runfiles/absl_py/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/tmp/Bazel.runfiles_wp0ophpl/runfiles/absl_py/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/tmp/Bazel.runfiles_wp0ophpl/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 163, in main
    make_examples_core.make_examples_runner(options)
  File "/tmp/Bazel.runfiles_wp0ophpl/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 1528, in make_examples_runner
    regions = processing_regions_from_options(options)
  File "/tmp/Bazel.runfiles_wp0ophpl/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 1375, in processing_regions_from_options
    raise ValueError('The regions to call is empty. Check your --regions and '
ValueError: The regions to call is empty. Check your --regions and --exclude_regions flags to make sure they are not resulting in set of empty region to process. This also happens if you use "chr20" for a BAM where contig names don't have "chr"s (or vice versa).
parallel: This job failed:
/opt/deepvariant/bin/make_examples --mode calling --ref genome.fasta --reads test.paired_end.sorted.bam --examples /tmp/tmp581s2kj4/make_examples.tfrecord@2.gz --gvcf /tmp/tmp581s2kj4/gvcf.tfrecord@2.gz --regions chr20:10,000,000-10,010,000 --task 0
parallel: This job failed:


@FriederikeHanssen
Copy link
Contributor

Can you try with this region: https://github.com/nf-core/test-datasets/blob/modules/data/genomics/homo_sapiens/genome/genome.bed

@abhi18av
Copy link
Member Author

Bingo - thanks Friedrike!

Docker ✅
Singularity ✅

Shall we go ahead with the merge? 🤩

@FriederikeHanssen
Copy link
Contributor

Conda fails, seems fine to me :D Yes update and merge I'd say. Thanks a bunch 😊

Copy link
Contributor

@FriederikeHanssen FriederikeHanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abhi18av abhi18av dismissed maxulysse’s stale review January 17, 2022 10:04

The concerns we addressed in latest changes.

@abhi18av
Copy link
Member Author

Thanks - I don't see the Merge option here, perhaps a permission issue?

In any case, just to recap for anyone else who's ready to review the PR, the conda issue was addressed as suggested here #572 (comment)

@FriederikeHanssen
Copy link
Contributor

I think maybe you were out of sync with master branch. I have merged it now and can squash and merge once the tests are done

@abhi18av abhi18av merged commit 6243c37 into nf-core:master Jan 17, 2022
@abhi18av
Copy link
Member Author

Aaaaand - we're in! 🙏

Now, we can replace the use of this module in sarek https://github.com/nf-core/sarek/blob/dev/modules/local/deepvariant/main.nf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed new module Adding a new module WIP Work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new module: deepvariant
7 participants