Skip to content

Commit

Permalink
fix: Correctly handle non str index list for bwa-mem2/mem (#3101)
Browse files Browse the repository at this point in the history
As reported in #522, currently the `bwa2-mem` wrapper has issues if
`idx` is a sequence is it incorrectly checked `input.index` to be as
string instead of `input.idx`.

This bug was silent, as usually `snakemake.input.index` would always be
a available as `functools.partial(<function Namedlist._used_attribute at
0x73ac4dbc0c20>, _name='index')` or similar. Thus the predicate that it
was not a `str` was always true.
Now with `snakemake=8.16.0` this changed and the bug becomes an error
running the already existing tests.

While the discussion in #522 indicates, that there is an underlying
issue with the concept of `input.idx` being as sequence of indices, this
may need rework later. Until all the details are decided, having a
correct working version should be still valuable. With the current state
any workflow using this wrapper using `snakemake=8.16.0` is broken.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] I confirm that I have followed the [documentation for contributing
to
`snakemake-wrappers`](https://snakemake-wrappers.readthedocs.io/en/stable/contributing.html).

While the contributions guidelines are more extensive, please
particularly ensure that:
* [x] `test.py` was updated to call any added or updated example rules
in a `Snakefile`
* [x] `input:` and `output:` file paths in the rules can be chosen
arbitrarily
* [x] wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`)
* [x] temporary files are either written to a unique hidden folder in
the working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to
* [x] the `meta.yaml` contains a link to the documentation of the
respective tool or command under `url:`
* [x] conda environments use a minimal amount of channels and packages,
in recommended ordering


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
- Enhanced documentation clarifying input requirements for index files,
ensuring users know all necessary file extensions.
- Added a new function to extract the longest common prefix from index
file names, improving error handling.

- **Bug Fixes**
- Updated index file specifications to require a complete list of index
files, preventing execution errors due to missing files.

- **Documentation**
- Improved comments and descriptions in the Snakefile and meta.yaml for
better user guidance on index file usage.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Vito Zanotelli <vito.zanotelli@gmail.com>
  • Loading branch information
votti and Vito Zanotelli authored Aug 15, 2024
1 parent 8b42097 commit 6f46508
Show file tree
Hide file tree
Showing 5 changed files with 30 additions and 14 deletions.
5 changes: 4 additions & 1 deletion bio/bwa-mem2/mem/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@ authors:
- Thibault Dayris
input:
- reads: List of path(s) to FASTQ file(s)
- idx: List of paths to indexed reference genome files
- idx: |
List of paths to indexed reference genome files.
All index files required need to be declared:
".0123", ".amb", ".ann", ".bwt.2bit.64", ".pac"
output:
- SAM/BAM/CRAM file
notes: |
Expand Down
8 changes: 4 additions & 4 deletions bio/bwa-mem2/mem/test/Snakefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
rule bwa_mem2_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
# Index can be a list of (all) files created by bwa, or one of them
idx=multiext("genome.fasta", ".amb", ".ann", ".bwt.2bit.64", ".pac"),
# Index needs to be a list of all index files created by bwa
idx=multiext("genome.fasta", ".amb", ".ann", ".bwt.2bit.64", ".pac", ".0123"),
output:
"mapped/{sample}.bam",
log:
Expand All @@ -20,8 +20,8 @@ rule bwa_mem2_mem:
rule bwa_mem2_mem_sam:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
# Index can be a list of (all) files created by bwa, or one of them
idx=multiext("genome.fasta", ".amb", ".ann", ".bwt.2bit.64", ".pac"),
# Index needs to be a list of all index files created by bwa
idx=multiext("genome.fasta", ".amb", ".ann", ".bwt.2bit.64", ".pac", ".0123"),
output:
"mapped/{sample}.sam",
log:
Expand Down
4 changes: 2 additions & 2 deletions bio/bwa-mem2/mem/test/Snakefile_picard
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
rule bwa_mem2_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
# Index can be a list of (all) files created by bwa, or one of them
idx=multiext("genome.fasta", ".amb", ".ann", ".bwt.2bit.64", ".pac"),
# Index needs to be a list of all index files created by bwa
idx=multiext("genome.fasta", ".amb", ".ann", ".bwt.2bit.64", ".pac", ".0123"),
output:
"mapped/{sample}.bam",
log:
Expand Down
4 changes: 2 additions & 2 deletions bio/bwa-mem2/mem/test/Snakefile_samtools
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
rule bwa_mem2_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
# Index can be a list of (all) files created by bwa, or one of them
idx=multiext("genome.fasta", ".amb", ".ann", ".bwt.2bit.64", ".pac"),
# Index needs to be a list of all index files created by bwa
idx=multiext("genome.fasta", ".amb", ".ann", ".bwt.2bit.64", ".pac", ".0123"),
output:
"mapped/{sample}.bam",
log:
Expand Down
23 changes: 18 additions & 5 deletions bio/bwa-mem2/mem/wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@
from snakemake_wrapper_utils.java import get_java_opts
from snakemake_wrapper_utils.samtools import get_samtools_opts

# All idx required by bwa-mem2
REQUIRED_IDX = {".0123", ".amb", ".ann", ".bwt.2bit.64", ".pac"}


# Extract arguments.
extra = snakemake.params.get("extra", "")
Expand All @@ -27,11 +30,21 @@
bwa_threads = snakemake.threads
samtools_threads = snakemake.threads - 1

index = snakemake.input.get("index", "")
if isinstance(index, str):
index = path.splitext(snakemake.input.idx)[0]
else:
index = path.splitext(snakemake.input.idx[0])[0]

# Extract index from input
# and check that all required indices are declared
index = path.commonprefix(snakemake.input.idx)[:-1]

if len(index) == 0:
raise ValueError("Could not determine common prefix of inputs.idx files.")

index_extensions = [idx[len(index) :] for idx in snakemake.input.idx]
missing_idx = REQUIRED_IDX - set(index_extensions)
if len(missing_idx) > 0:
raise ValueError(
f"Missing required indices: {missing_idx} declarad as input.idx.\n"
f"Identified reference file is {index} with extensions {index_extensions}"
)


# Check inputs/arguments.
Expand Down

0 comments on commit 6f46508

Please sign in to comment.