Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First version of the biobambam/bammarkduplicates2 module #1247

Merged
merged 4 commits into from
Feb 14, 2022

Conversation

muffato
Copy link
Member

@muffato muffato commented Jan 31, 2022

Closes #1245

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd
    • PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd
    • PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd

@muffato
Copy link
Member Author

muffato commented Jan 31, 2022

Sorry, I must have entered the wrong MD5 checksums. Will fix it ASAP

@muffato
Copy link
Member Author

muffato commented Jan 31, 2022

I don't understand why the test is failing here. Locally it passes

$ env TMPDIR=$HOME/nfs/cache/nextflow/ttmp PROFILE=singularity pytest --tag biobambam/bammarkduplicates2 --symlink --keep-workflow-wd --git-aware
================================================================================== test session starts ===================================================================================
platform linux -- Python 3.9.5, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /nfs/users/nfs_m/mm49/workspace/tol-it/nextflow/modules
plugins: workflow-1.6.0
collecting ... 
collected 601 items                                                                                                                                                                      

biobambam bammarkduplicates2 test_biobambam_bammarkduplicates2:
        command:   nextflow run tests/modules/biobambam/bammarkduplicates2 -entry test_biobambam_bammarkduplicates2 -c tests/config/nextflow.config
        directory: /nfs/users/nfs_m/mm49/nfs/cache/nextflow/ttmp/pytest_workflow_ezgn0aar/biobambam_bammarkduplicates2_test_biobambam_bammarkduplicates2
        stdout:    /nfs/users/nfs_m/mm49/nfs/cache/nextflow/ttmp/pytest_workflow_ezgn0aar/biobambam_bammarkduplicates2_test_biobambam_bammarkduplicates2/log.out
        stderr:    /nfs/users/nfs_m/mm49/nfs/cache/nextflow/ttmp/pytest_workflow_ezgn0aar/biobambam_bammarkduplicates2_test_biobambam_bammarkduplicates2/log.err
'biobambam bammarkduplicates2 test_biobambam_bammarkduplicates2' done.

tests/test_versions_yml.py sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 25%]
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 54%]
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 84%]
sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s                                                                                            [ 98%]
tests/modules/biobambam/bammarkduplicates2/test.yml .......                                                                                                                        [100%] Keeping temporary directories and logs. Use '--kwd' or '--keep-workflow-wd' to disable this behaviour.


============================================================================ 8 passed, 593 skipped in 16.13s =============================================================================

@muffato muffato added new module Adding a new module Ready for Review labels Jan 31, 2022
@muffato
Copy link
Member Author

muffato commented Feb 1, 2022

pytest CI now working as expected with Singularity/Docker.
It fails when run on Conda (both on the CI and my local environment), because the MD5 checksum of the output .bam file differs. The .bam generated is actually identical to the reference according to samtools view -h. There must be something that makes the zlib stream differ. Do I need to make the test workflow run samtools view ?

@drpatelh
Copy link
Member

drpatelh commented Feb 8, 2022

Hi @muffato can you try and run samtools view -H <test.bam> and see what the header looks like? Is there a path in the header of the BAM file? That could explain why the tests are failing with Conda.

@muffato
Copy link
Member Author

muffato commented Feb 8, 2022

Hi @drpatelh . The only difference is the @PG line added by samtools view -H itself. The rest is identical

$ find work/ -name test.bam | xargs md5sum
1cf7f957eb20b4ace9f10d0cf0a0649a  work/db/a9fcf3687e0c1065d36602cbc8ebc7/test.bam  # PROFILE=singularity
b61a9e48c2603815416e970d2bc7bb27  work/00/80fdde2c6e9f6fbc3070db0ac99d30/test.bam  # PROFILE=conda

$ samtools view -H work/db/a9fcf3687e0c1065d36602cbc8ebc7/test.bam > headers_singularity.txt
$ samtools view -H  work/00/80fdde2c6e9f6fbc3070db0ac99d30/test.bam > headers_conda.txt
$ diff headers_*
7c7
< @PG   ID:samtools.1   PN:samtools     PP:bammarkduplicates2   VN:1.14 CL:/usr/local/bin/samtools view -H work/00/80fdde2c6e9f6fbc3070db0ac99d30/test.bam
---
> @PG   ID:samtools.1   PN:samtools     PP:bammarkduplicates2   VN:1.14 CL:/usr/local/bin/samtools view -H work/db/a9fcf3687e0c1065d36602cbc8ebc7/test.bam

$ samtools view -h work/db/a9fcf3687e0c1065d36602cbc8ebc7/test.bam | grep -vF 'CL:/usr/local/bin/samtools view -h' | md5sum
4a19197b007e31d93b4182a7d7eb0f10  -
$ samtools view -h work/00/80fdde2c6e9f6fbc3070db0ac99d30/test.bam | grep -vF 'CL:/usr/local/bin/samtools view -h' | md5sum
4a19197b007e31d93b4182a7d7eb0f10  -

@drpatelh
Copy link
Member

drpatelh commented Feb 8, 2022

Hi @drpatelh . The only difference is the @pg line added by samtools view -H itself. The rest is identical

Nice! Yes, that's why the tests would have been failing. Looks like the work directory is being appended to the BAM header which will be different everytime. We can ignore the Conda failures in this instance as long as the Docker / Singularity tests are passing.

@muffato
Copy link
Member Author

muffato commented Feb 8, 2022

I don't think that's the case. This line that differs is the extra samtools view I've run for debugging. All the other headers, incl. the bammarkduplicates2 one are identical. Here is the complete header:

@HD     VN:1.6  SO:unsorted
@SQ     SN:MT192765.1   LN:29829
@RG     ID:1    LB:lib1 PL:ILLUMINA     SM:test PU:barcode1
@PG     ID:minimap2     PN:minimap2     VN:2.17-r941    CL:minimap2 -ax sr tests/data/fasta/sarscov2/GCA_011545545.1_ASM1154554v1_genomic.fna tests/data/fastq/dna/sarscov2_1.fastq.gz tests/data/fastq/dna/sarscov2_2.fastq.gz
@PG     ID:samtools     PN:samtools     PP:minimap2     VN:1.11 CL:samtools view -Sb sarscov2_aln.sam
@PG     ID:bammarkduplicates2   PN:bammarkduplicates2   CL:bammarkduplicates2 I=test.paired_end.bam O=test.bam M=test.metrics.txt tmpfile=test markthreads=2    PP:samtools      VN:2.0.182
@PG     ID:samtools.1   PN:samtools     PP:bammarkduplicates2   VN:1.14 CL:/usr/local/bin/samtools view -H work/00/80fdde2c6e9f6fbc3070db0ac99d30/test.bam

@drpatelh
Copy link
Member

drpatelh commented Feb 9, 2022

Ah, good point. I assumed the samtools view was being appended by the tool itself. So any indication as to why the md5sums are different then?

In any case, as long as the md5sum checks are passing with Docker/Singularity and the module is running with Conda I think we can merge this in once you have addressed this comment #1247 (comment)

@muffato muffato force-pushed the bammarkduplicates2 branch from 5131c9b to ab53534 Compare February 9, 2022 12:00
@muffato
Copy link
Member Author

muffato commented Feb 9, 2022

Oh, there's something interesting: The conda CI now passes ! Another reason to remove createOptions = "-c conda-forge"

But on the other hand, linting now fails. The first error seems to be in markdownlint itself, not my code, and the second error is about when which I don't use in my module

@drpatelh
Copy link
Member

Thanks @muffato ! Still need to update your branch with master as shown below the tests. I would have done this and merged but I can't push commits to organisational repos.

Copy link
Member

@drpatelh drpatelh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@muffato
Copy link
Member Author

muffato commented Feb 14, 2022

Rebase done

@drpatelh drpatelh merged commit 04e82ec into nf-core:master Feb 14, 2022
@muffato
Copy link
Member Author

muffato commented Feb 14, 2022

Thank you 🤝 !

@muffato muffato deleted the bammarkduplicates2 branch February 24, 2022 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new module: biobambam/bammarkduplicates2
2 participants