-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add vg giraffe #3302
feat: Add vg giraffe #3302
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThis pull request introduces several new files and modifications in the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (4)
bio/vg/giraffe/test/Snakefile (2)
10-10
: Consider making threads configurableInstead of hardcoding 8 threads, consider making it configurable through config file or resources directive for better adaptability to different environments.
- threads: 8 # [hide] + threads: config.get("vg_autoindex_threads", 8) # [hide]
30-30
: Consider making threads configurableSimilar to the previous rule, thread count should be configurable rather than hardcoded.
- threads: 8 + threads: config.get("vg_giraffe_threads", 8)bio/vg/giraffe/wrapper.py (2)
14-22
: Consider validating sort_order parameter earlier.The
sort_order
parameter is used in multiple places but only validated later in the code. Consider moving the validation right after parameter extraction for early failure.sort = snakemake.params.get("sorting", "none") sort_order = snakemake.params.get("sort_order", "coordinate") +if sort_order not in {"coordinate", "queryname"}: + raise ValueError("Unexpected value for sort_order ({})".format(sort_order)) sort_extra = snakemake.params.get("sort_extra", "")🧰 Tools
🪛 Ruff (0.7.0)
15-15: Undefined name
snakemake
(F821)
16-16: Undefined name
snakemake
(F821)
17-17: Undefined name
snakemake
(F821)
18-18: Undefined name
snakemake
(F821)
19-19: Undefined name
snakemake
(F821)
20-20: Undefined name
snakemake
(F821)
21-21: Undefined name
snakemake
(F821)
24-43
: Consider validating input file existence.While Snakemake typically handles file existence checks, it would be good practice to verify that optional inputs (
dist_index
andminimizer
) exist when provided.dist_index = snakemake.input.get("dist", None) if dist_index: + if not path.exists(dist_index): + raise ValueError(f"Distance index file not found: {dist_index}") input_cmd += f"-d {dist_index} " minimizer = snakemake.input.get("minimizer", None) if minimizer: + if not path.exists(minimizer): + raise ValueError(f"Minimizer file not found: {minimizer}") input_cmd += f"-m {minimizer}"🧰 Tools
🪛 Ruff (0.7.0)
25-25: Undefined name
snakemake
(F821)
36-36: Undefined name
snakemake
(F821)
40-40: Undefined name
snakemake
(F821)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
bio/vg/giraffe/environment.linux-64.pin.txt
(1 hunks)bio/vg/giraffe/environment.yaml
(1 hunks)bio/vg/giraffe/meta.yaml
(1 hunks)bio/vg/giraffe/test/Snakefile
(1 hunks)bio/vg/giraffe/test/genome.fasta
(1 hunks)bio/vg/giraffe/test/reads/a.1.fastq
(1 hunks)bio/vg/giraffe/test/reads/a.2.fastq
(1 hunks)bio/vg/giraffe/wrapper.py
(1 hunks)test_wrappers.py
(1 hunks)
✅ Files skipped from review due to trivial changes (6)
- bio/vg/giraffe/environment.linux-64.pin.txt
- bio/vg/giraffe/environment.yaml
- bio/vg/giraffe/meta.yaml
- bio/vg/giraffe/test/genome.fasta
- bio/vg/giraffe/test/reads/a.1.fastq
- bio/vg/giraffe/test/reads/a.2.fastq
🧰 Additional context used
📓 Path-based instructions (2)
bio/vg/giraffe/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
test_wrappers.py (1)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
🪛 Ruff (0.7.0)
bio/vg/giraffe/wrapper.py
15-15: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
17-17: Undefined name snakemake
(F821)
18-18: Undefined name snakemake
(F821)
19-19: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
21-21: Undefined name snakemake
(F821)
25-25: Undefined name snakemake
(F821)
36-36: Undefined name snakemake
(F821)
40-40: Undefined name snakemake
(F821)
46-46: Undefined name snakemake
(F821)
46-46: Undefined name snakemake
(F821)
53-53: Undefined name snakemake
(F821)
54-54: Undefined name snakemake
(F821)
55-55: Undefined name snakemake
(F821)
🔇 Additional comments (8)
bio/vg/giraffe/test/Snakefile (3)
1-12
: LGTM! Well-structured rule implementation
The rule follows Snakemake best practices with:
- Flexible input/output paths using wildcards
- Proper use of multiext() for related outputs
- Appropriate logging configuration
- Clear parameter structure
26-29
: LGTM! Well-structured parameters
The parameters section is well-documented with clear options for:
- Sorting method selection
- Sort order configuration
- Additional sorting arguments
12-12
: Verify wrapper paths exist
Since this PR introduces new functionality, we should verify that the wrapper paths exist and are properly implemented.
Also applies to: 32-32
✅ Verification successful
Wrapper paths exist and contain required components
The verification confirms that both wrapper paths exist and contain the necessary implementation:
bio/vg/autoindex/wrapper.py
exists and contains bothsnakemake
andshell
componentsbio/vg/giraffe/wrapper.py
exists and contains bothsnakemake
andshell
components
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if the wrapper files exist and contain necessary implementation
fd -t f "wrapper.py" -x echo "Checking wrapper file: {}"; \
for wrapper in "bio/vg/autoindex" "bio/vg/giraffe"; do
if [ -f "${wrapper}/wrapper.py" ]; then
echo "Found wrapper: ${wrapper}/wrapper.py"
# Check for essential wrapper components
rg -l "snakemake" "${wrapper}/wrapper.py"
rg -l "shell" "${wrapper}/wrapper.py"
else
echo "Missing wrapper: ${wrapper}/wrapper.py"
fi
done
Length of output: 24519
bio/vg/giraffe/wrapper.py (4)
1-12
: LGTM: Metadata and imports are well-structured.
The metadata is complete and the necessary imports are present for wrapper functionality.
45-60
: LGTM: Input validation is thorough.
The validation logic properly handles both single-end and paired-end reads, with clear error messages.
🧰 Tools
🪛 Ruff (0.7.0)
46-46: Undefined name snakemake
(F821)
46-46: Undefined name snakemake
(F821)
53-53: Undefined name snakemake
(F821)
54-54: Undefined name snakemake
(F821)
55-55: Undefined name snakemake
(F821)
83-92
: LGTM: Command execution is well-structured.
The command execution properly handles temporary directories and constructs the pipeline command correctly.
1-92
: Verify consistency with other bioinformatics wrappers.
Let's ensure this wrapper follows similar patterns to other bioinformatics tools in the repository.
✅ Verification successful
Wrapper follows consistent parameter patterns with the bioinformatics ecosystem
The verification shows that this wrapper follows the common patterns used across the bioinformatics wrappers in the repository:
- Uses the standard
extra
parameter pattern (most common with 330 occurrences) - Implements common sorting parameters (
sort_extra
,sort_order
) consistently with other wrappers - Follows the same BAM output handling pattern (confirmed as the only wrapper using
output-format BAM
which is appropriate for this specific tool) - Uses standard parameter names and default values aligned with other bio wrappers (e.g.,
sorting
,sort_order
)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check patterns in other bioinformatics wrappers
# Find similar wrappers that handle BAM output
echo "Similar wrappers handling BAM output:"
fd -e py wrapper.py -x grep -l "output-format BAM" {}
# Check common parameter patterns
echo "\nCommon parameter patterns in bio wrappers:"
fd -e py wrapper.py -x grep -h "snakemake.params.get" {} | sort | uniq -c | sort -nr
Length of output: 11028
🧰 Tools
🪛 Ruff (0.7.0)
15-15: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
17-17: Undefined name snakemake
(F821)
18-18: Undefined name snakemake
(F821)
19-19: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
21-21: Undefined name snakemake
(F821)
25-25: Undefined name snakemake
(F821)
36-36: Undefined name snakemake
(F821)
40-40: Undefined name snakemake
(F821)
46-46: Undefined name snakemake
(F821)
46-46: Undefined name snakemake
(F821)
53-53: Undefined name snakemake
(F821)
54-54: Undefined name snakemake
(F821)
55-55: Undefined name snakemake
(F821)
test_wrappers.py (1)
5961-5965
: LGTM! Test function implementation is correct and consistent.
The test function follows the established pattern in the file and properly verifies the basic functionality of the vg giraffe wrapper by checking BAM file generation.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (3)
bio/vg/giraffe/test/Snakefile (3)
5-5
: Consider making the output directory configurableThe
resources/
directory is hardcoded in the output path. Consider making it configurable through config values for better flexibility.- multiext("resources/{genome}", ".dist", ".min", ".giraffe.gbz"), + multiext(lambda wildcards, config: config["resource_dir"] + "/{genome}", ".dist", ".min", ".giraffe.gbz"),
1-12
: Add rule documentationPlease add a docstring to describe:
- Expected input FASTA format
- Purpose of each output file
- Recommended thread count rationale
rule vg_autoindex: + """Generate VG graph indices for giraffe mapping. + + Input: + ref: Reference genome in FASTA format + + Output: + .dist: Distance index + .min: Minimizer index + .giraffe.gbz: Graph in GBZ format + """ input: ref="{genome}.fasta",
17-17
: Make read input pattern more flexibleThe current pattern assumes a specific naming convention for paired reads (
.1.fastq
/.2.fastq
). Consider supporting more common patterns.- reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"], + # Support common paired-end patterns + reads=lambda wildcards: [ + f"reads/{wildcards.sample}_R1.fastq", # Illumina style + f"reads/{wildcards.sample}_R2.fastq" + ] if config.get("illumina_style", True) else [ + f"reads/{wildcards.sample}.1.fastq", # ENA style + f"reads/{wildcards.sample}.2.fastq" + ],
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
bio/vg/giraffe/test/Snakefile
(1 hunks)
🔇 Additional comments (1)
bio/vg/giraffe/test/Snakefile (1)
31-32
: Verify wrapper implementation details
Please ensure the wrapper script at master/bio/vg/giraffe
:
- Infers command line arguments from file extensions where possible
- Handles temporary files appropriately (using hidden directory or tempfile.gettempdir())
✅ Verification successful
Command line arguments and temporary files are properly handled
The wrapper implementation correctly addresses both concerns:
-
Command line arguments are inferred from file extensions:
- Automatically detects
.gbz
,.xg
, and.gbwt
extensions and maps them to appropriate flags (-Z
,-x
,-g
) - Input reads handling adapts based on single-end (1 file) or paired-end (2 files) data
- Automatically detects
-
Temporary files are handled appropriately:
- Uses Python's
tempfile.TemporaryDirectory()
context manager - Temporary directory path is passed to sorting tools via
-T {tmpdir}
- Directory is automatically cleaned up after use
- Uses Python's
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check wrapper implementation
rg -A 10 "tempfile|gettempdir" "bio/vg/giraffe/wrapper.py"
rg -A 10 "\..*$" "bio/vg/giraffe/wrapper.py"
Length of output: 3504
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
bio/vg/giraffe/wrapper.py (3)
8-8
: Remove unused import.The
os.path
module is imported but never used in the code.-from os import path
🧰 Tools
🪛 Ruff (0.8.0)
8-8:
os.path
imported but unusedRemove unused import:
os.path
(F401)
24-32
: Consider adding file existence validation.While the code checks for file extensions, it doesn't verify if the files actually exist. Consider adding an explicit check.
for ext, param in expected_files.items(): matching_files = [file for file in snakemake.input.index if file.endswith(ext)] if not matching_files: raise ValueError(f"Missing required file with extension: {ext}") + if not os.path.exists(matching_files[0]): + raise ValueError(f"File does not exist: {matching_files[0]}") input_cmd += f" {param} {matching_files[0]}"🧰 Tools
🪛 Ruff (0.8.0)
28-28: Undefined name
snakemake
(F821)
47-49
: Improve error message and use f-strings.Consider using f-strings and providing a more descriptive error message that includes valid options.
if sort_order not in {"coordinate", "queryname"}: - raise ValueError("Unexpected value for sort_order ({})".format(sort_order)) + raise ValueError(f"Invalid sort_order: {sort_order}. Expected 'coordinate' or 'queryname'")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
bio/vg/giraffe/test/Snakefile
(1 hunks)bio/vg/giraffe/wrapper.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- bio/vg/giraffe/test/Snakefile
🧰 Additional context used
📓 Path-based instructions (1)
bio/vg/giraffe/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
🪛 Ruff (0.8.0)
bio/vg/giraffe/wrapper.py
8-8: os.path
imported but unused
Remove unused import: os.path
(F401)
15-15: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
17-17: Undefined name snakemake
(F821)
18-18: Undefined name snakemake
(F821)
19-19: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
21-21: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
35-35: Undefined name snakemake
(F821)
35-35: Undefined name snakemake
(F821)
42-42: Undefined name snakemake
(F821)
43-43: Undefined name snakemake
(F821)
44-44: Undefined name snakemake
(F821)
🔇 Additional comments (3)
bio/vg/giraffe/wrapper.py (3)
14-22
: LGTM!
Parameter extraction is well-structured with appropriate default values and utilizes wrapper utility functions correctly.
🧰 Tools
🪛 Ruff (0.8.0)
15-15: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
17-17: Undefined name snakemake
(F821)
18-18: Undefined name snakemake
(F821)
19-19: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
21-21: Undefined name snakemake
(F821)
72-81
: LGTM!
The command execution is well-structured with proper temporary directory handling and thread management.
51-70
: Review command injection risks.
The code concatenates user-provided values into shell commands. While Snakemake provides some safety, consider additional validation of sort_extra
and other user inputs to prevent command injection.
🤖 I have created a release \*beep\* \*boop\* --- ## [5.3.0](https://www.github.com/snakemake/snakemake-wrappers/compare/v5.2.1...v5.3.0) (2024-11-27) ### Features * Add vg giraffe ([#3302](https://www.github.com/snakemake/snakemake-wrappers/issues/3302)) ([fbf806d](https://www.github.com/snakemake/snakemake-wrappers/commit/fbf806d97336b23caa443b9063b0042e97712d28)) * MtNucRatioCalculator ([#3496](https://www.github.com/snakemake/snakemake-wrappers/issues/3496)) ([b221057](https://www.github.com/snakemake/snakemake-wrappers/commit/b2210576f75be6a90569b676688b476c4aaba7a8)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This adds
vg giraffe
for mapping reads as bam records.Like for
bwa mem
sorting reads by samtools, picard or fgbio is supported.QC
snakemake-wrappers
.While the contributions guidelines are more extensive, please particularly ensure that:
test.py
was updated to call any added or updated example rules in aSnakefile
input:
andoutput:
file paths in the rules can be chosen arbitrarilyinput:
oroutput:
)tempfile.gettempdir()
points tometa.yaml
contains a link to the documentation of the respective tool or command underurl:
Summary by CodeRabbit
Release Notes
New Features
environment.linux-64.pin.txt
andenvironment.yaml
files for easier setup of a Conda environment tailored for bioinformatics workflows.Bug Fixes
vg giraffe
command.Tests
vg giraffe
workflow.