-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: NGSCheckMate make pattern #3499
Conversation
* perf: update bio/bcftools/index/environment.yaml. * perf: update bio/bcftools/index/environment.yaml. * perf: update bio/bcftools/index/environment.yaml.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
* Add autobump action * fix paths * dbg * dbg branch * add checkout * dbg * trigger rerun * entity regex and add label * dbg * Update autobump.yml * Update autobump.yml
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
📝 Walkthrough📝 WalkthroughWalkthroughThe changes in this pull request introduce several new files and a new rule for the Changes
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)bio/ngscheckmate/makesnvpattern/wrapper.py (2)Pattern Pattern 📓 Learnings (1)bio/ngscheckmate/makesnvpattern/wrapper.py (2)
🪛 Ruff (0.8.0)bio/ngscheckmate/makesnvpattern/wrapper.py12-12: Undefined name (F821) 14-14: Undefined name (F821) 31-31: Loop control variable Rename unused (B007) 32-32: Undefined name (F821) 🔇 Additional comments (4)bio/ngscheckmate/makesnvpattern/wrapper.py (4)
The file header contains complete metadata and appropriate imports for the required functionality.
The command execution is properly structured within a temporary directory context, and the shell command is correctly formatted.
The output handling implementation:
🧰 Tools🪛 Ruff (0.8.0)31-31: Loop control variable Rename unused (B007) 32-32: Undefined name (F821)
The code uses ✅ Verification successfulInput configuration is correctly defined The input field name 🏁 Scripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Description: Check input field definitions in meta.yaml
cat bio/ngscheckmate/makesnvpattern/meta.yaml | grep -A 5 "input:"
Length of output: 322 🧰 Tools🪛 Ruff (0.8.0)12-12: Undefined name (F821) 14-14: Undefined name (F821) Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (2)
bio/ngscheckmate/makesnvpattern/test/Snakefile (1)
23-27
: Consider using a specific version tag instead of 'master'Using the 'master' branch for the wrapper path may lead to reproducibility issues as the wrapper could change over time. Consider using a specific version tag instead.
Apply this diff:
wrapper: - "master/bio/ngscheckmate/makesnvpattern" + "v1.3.2/bio/ngscheckmate/makesnvpattern"bio/ngscheckmate/makesnvpattern/wrapper.py (1)
1-61
: Add wrapper documentationConsider adding docstring documentation that includes:
- Description of the wrapper's purpose
- Required and optional input parameters
- Description of output files
- Example usage in a Snakefile
Would you like me to help generate the documentation?
🧰 Tools
🪛 Ruff (0.7.0)
13-13: Undefined name
snakemake
(F821)
14-14: Undefined name
snakemake
(F821)
16-16: Undefined name
snakemake
(F821)
27-27: Undefined name
snakemake
(F821)
31-31: Undefined name
snakemake
(F821)
35-35: Undefined name
snakemake
(F821)
39-39: Undefined name
snakemake
(F821)
43-43: Undefined name
snakemake
(F821)
47-47: Undefined name
snakemake
(F821)
51-51: Undefined name
snakemake
(F821)
55-55: Undefined name
snakemake
(F821)
59-59: Undefined name
snakemake
(F821)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (8)
bio/ngscheckmate/makesnvpattern/environment.yaml
(1 hunks)bio/ngscheckmate/makesnvpattern/meta.yaml
(1 hunks)bio/ngscheckmate/makesnvpattern/test/Snakefile
(1 hunks)bio/ngscheckmate/makesnvpattern/test/genome.fasta
(1 hunks)bio/ngscheckmate/makesnvpattern/test/genome_bwt.4.ebwt
(1 hunks)bio/ngscheckmate/makesnvpattern/test/variants.bed
(1 hunks)bio/ngscheckmate/makesnvpattern/wrapper.py
(1 hunks)test_wrappers.py
(1 hunks)
✅ Files skipped from review due to trivial changes (4)
- bio/ngscheckmate/makesnvpattern/environment.yaml
- bio/ngscheckmate/makesnvpattern/meta.yaml
- bio/ngscheckmate/makesnvpattern/test/genome_bwt.4.ebwt
- bio/ngscheckmate/makesnvpattern/test/variants.bed
🧰 Additional context used
📓 Path-based instructions (2)
bio/ngscheckmate/makesnvpattern/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
test_wrappers.py (1)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
🪛 Gitleaks (8.21.1)
bio/ngscheckmate/makesnvpattern/test/genome.fasta
7-7: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.
(aws-access-token)
7-7: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.
(aws-access-token)
8-8: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.
(aws-access-token)
🪛 Ruff (0.7.0)
bio/ngscheckmate/makesnvpattern/wrapper.py
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
27-27: Undefined name snakemake
(F821)
31-31: Undefined name snakemake
(F821)
35-35: Undefined name snakemake
(F821)
39-39: Undefined name snakemake
(F821)
43-43: Undefined name snakemake
(F821)
47-47: Undefined name snakemake
(F821)
51-51: Undefined name snakemake
(F821)
55-55: Undefined name snakemake
(F821)
59-59: Undefined name snakemake
(F821)
🔇 Additional comments (4)
bio/ngscheckmate/makesnvpattern/test/Snakefile (1)
1-27
: Verify test.py integration
The PR objectives mention that test.py should be updated to include calls to new rules.
Let's verify if test.py has been properly updated:
✅ Verification successful
Test integration is properly implemented
The test for ngscheckmate_makesnvpattern
is correctly implemented in test_wrappers.py
. The test function executes the rule with appropriate parameters to generate the pattern file (genome.pt
).
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if test.py includes the new rule
rg -l "test_ngscheckmate_makesnvpattern" "test.py"
Length of output: 96
Script:
#!/bin/bash
# Let's first find the test file in the repository
fd "test.py|test_.*\.py" --type f
# Then check for any test files specifically in the bio/ngscheckmate directory
fd "test.py|test_.*\.py" --type f "bio/ngscheckmate"
Length of output: 105
Script:
#!/bin/bash
# Let's check if test_wrappers.py contains any tests for ngscheckmate
rg -A 5 "ngscheckmate|makesnvpattern" test_wrappers.py
Length of output: 267
bio/ngscheckmate/makesnvpattern/wrapper.py (2)
1-11
: LGTM! Well-structured file header with complete metadata.
The imports are appropriate for the wrapper's functionality.
16-16
: Verify index extraction robustness
The use of commonprefix
to extract the index name could be fragile if the input bowtie files don't share a meaningful common prefix. Consider adding validation to ensure the extracted index is valid.
🧰 Tools
🪛 Ruff (0.7.0)
16-16: Undefined name snakemake
(F821)
test_wrappers.py (1)
567-572
: LGTM! The new test function follows the established patterns.
The implementation of test_ngscheckmate_makesnvpattern
matches the structure of other test functions in the file, using the same fixture and command pattern to test the NGSCheckMate wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
bio/ngscheckmate/makesnvpattern/test/Snakefile (1)
26-27
: Consider using a version tag instead of 'master'While the wrapper configuration works, using
master
in the path might be fragile if the repository's default branch name changes in the future. Consider using a specific version tag for better stability.- wrapper: - "master/bio/ngscheckmate/makesnvpattern" + wrapper: + "v3.0.0/bio/ngscheckmate/makesnvpattern"bio/ngscheckmate/makesnvpattern/wrapper.py (1)
26-42
: Enhance move operation reliability.The output handling structure is clean, but consider adding error handling for the move operations and verifying file existence before moving.
for output_key, temp_file in output_mapping.items(): output_path = snakemake.output.get(output_key) if output_path: - shell("mv --verbose '{temp_file}' '{output_path}' {log}") + shell( + "if [ -f {temp_file} ]; then" + " mv --verbose {temp_file} '{output_path}' || " + " {{ echo 'Failed to move {temp_file} to {output_path}' >&2; exit 1; }};" + "else" + " echo 'Warning: Expected output file {temp_file} not found' >&2;" + "fi" + " {log}" + )🧰 Tools
🪛 Ruff (0.8.0)
39-39: Loop control variable
temp_file
not used within loop bodyRename unused
temp_file
to_temp_file
(B007)
40-40: Undefined name
snakemake
(F821)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
bio/ngscheckmate/makesnvpattern/test/Snakefile
(1 hunks)bio/ngscheckmate/makesnvpattern/wrapper.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/ngscheckmate/makesnvpattern/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
📓 Learnings (2)
bio/ngscheckmate/makesnvpattern/test/Snakefile (1)
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3499
File: bio/ngscheckmate/makesnvpattern/test/Snakefile:14-22
Timestamp: 2024-11-26T09:16:39.570Z
Learning: In the `snakemake-wrappers` repository, when writing test `Snakefile`s (e.g., `bio/ngscheckmate/makesnvpattern/test/Snakefile`), hardcoded input/output paths are acceptable because these are examples and the IO can be chosen freely.
bio/ngscheckmate/makesnvpattern/wrapper.py (1)
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3499
File: bio/ngscheckmate/makesnvpattern/wrapper.py:18-24
Timestamp: 2024-11-26T09:08:06.041Z
Learning: In Snakemake wrappers, input file validation is managed by Snakemake, so manual validation in the wrapper code is unnecessary.
🪛 Ruff (0.8.0)
bio/ngscheckmate/makesnvpattern/wrapper.py
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
39-39: Loop control variable temp_file
not used within loop body
Rename unused temp_file
to _temp_file
(B007)
40-40: Undefined name snakemake
(F821)
🔇 Additional comments (5)
bio/ngscheckmate/makesnvpattern/test/Snakefile (2)
1-13
: LGTM! Input configuration is well-structured
The rule name follows the testing convention, and the input configuration correctly specifies all required files, including the complete set of Bowtie index extensions.
14-22
: LGTM! Excellent temporary file handling
Good practices observed:
- Appropriate use of
temp()
for intermediate files - Consistent naming scheme
- Complete set of expected NGSCheckMate output files
bio/ngscheckmate/makesnvpattern/wrapper.py (3)
1-11
: LGTM! Well-structured file header with appropriate imports.
The file header includes all necessary metadata and the imports are appropriate for the wrapper's functionality.
18-24
: 🛠️ Refactor suggestion
Add error handling for the makesnvpattern.pl command.
While the command execution is clean, consider adding error handling to provide better feedback when the command fails.
with TemporaryDirectory() as tempdir:
shell(
- "makesnvpattern.pl "
- "{snakemake.input.bed} "
- "{snakemake.input.fasta} "
- "{index} {tempdir} snake_out {log}"
+ "makesnvpattern.pl"
+ " {snakemake.input.bed}"
+ " {snakemake.input.fasta}"
+ " {index} {tempdir} snake_out"
+ " 2> >(tee -a {log} >&2) || {{ echo 'Error: makesnvpattern.pl failed' >&2; exit 1; }}"
)
Likely invalid or redundant comment.
16-16
: Verify index generation for edge cases.
The use of commonprefix
to generate the index might produce unexpected results if the bowtie input files don't share a meaningful common prefix.
Let's verify the bowtie input patterns in the codebase:
✅ Verification successful
Let me gather more information about the usage patterns and implementation:
Let me check the test Snakefile to understand the expected input patterns:
The index generation is safe and correctly implemented
The use of commonprefix
for generating the index is safe because:
- The bowtie input is explicitly defined using
multiext()
in Snakemake, which ensures all files share the same base prefix - The test case demonstrates the correct pattern with
genome_bwt
as the common prefix for all bowtie index files (.1.ebwt
,.2.ebwt
, etc.) - The
.rstrip(".")
operation safely handles the trailing dot without affecting the base prefix
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check bowtie input patterns in test files and documentation
# Search for bowtie input patterns in test files
echo "Checking test files for bowtie input patterns:"
rg -A 2 "input:\s*bowtie:" tests/
# Search for bowtie input examples in documentation
echo "Checking documentation for bowtie input examples:"
fd -e md -e yaml . | xargs rg "bowtie.*:"
Length of output: 5498
Script:
#!/bin/bash
# Check the implementation and usage of makesnvpattern
echo "Checking wrapper implementation:"
cat bio/ngscheckmate/makesnvpattern/wrapper.py
echo -e "\nChecking Snakefile or test cases:"
fd Snakefile bio/ngscheckmate/makesnvpattern/
fd -e smk bio/ngscheckmate/makesnvpattern/
echo -e "\nChecking example input patterns:"
fd test.smk bio/ngscheckmate/makesnvpattern/ -x cat
Length of output: 2355
Script:
#!/bin/bash
# Check test Snakefile content
cat bio/ngscheckmate/makesnvpattern/test/Snakefile
# Check if there are any other test files
echo -e "\nChecking for other test files:"
ls -la bio/ngscheckmate/makesnvpattern/test/
Length of output: 1685
🧰 Tools
🪛 Ruff (0.8.0)
16-16: Undefined name snakemake
(F821)
Can you pin the environment? |
Co-authored-by: Filipe G. Vieira <1151762+fgvieira@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
bio/ngscheckmate/makesnvpattern/wrapper.py (1)
27-37
: Remove redundant quotes in output mapping.The file paths in the mapping don't need quotes since we're using the
:q
quote modifier in the shell command.output_mapping = { - "bowtie": f"'{tempdir}/snake_out.bowtieout'", - "fasta": f"'{tempdir}/snake_out.fasta'", - "ntm": f"'{tempdir}/snake_out.ntm'", - "pattern": f"'{tempdir}/snake_out.pt'", - "pattern_text": f"'{tempdir}/snake_out.pt-txt'", - "pattern_sorted": f"'{tempdir}/snake_out.pt-txt.sorted'", - "text": f"'{tempdir}/snake_out.txt'", - "uniq": f"'{tempdir}/snake_out.uniq.txt'", - "txt_sorted": f"'{tempdir}/snake_out.uniq.txt.sorted'", + "bowtie": f"{tempdir}/snake_out.bowtieout", + "fasta": f"{tempdir}/snake_out.fasta", + "ntm": f"{tempdir}/snake_out.ntm", + "pattern": f"{tempdir}/snake_out.pt", + "pattern_text": f"{tempdir}/snake_out.pt-txt", + "pattern_sorted": f"{tempdir}/snake_out.pt-txt.sorted", + "text": f"{tempdir}/snake_out.txt", + "uniq": f"{tempdir}/snake_out.uniq.txt", + "txt_sorted": f"{tempdir}/snake_out.uniq.txt.sorted", }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
bio/ngscheckmate/makesnvpattern/wrapper.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/ngscheckmate/makesnvpattern/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
📓 Learnings (1)
bio/ngscheckmate/makesnvpattern/wrapper.py (1)
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3499
File: bio/ngscheckmate/makesnvpattern/wrapper.py:18-24
Timestamp: 2024-11-26T09:08:06.041Z
Learning: In Snakemake wrappers, input file validation is managed by Snakemake, so manual validation in the wrapper code is unnecessary.
🪛 Ruff (0.8.0)
bio/ngscheckmate/makesnvpattern/wrapper.py
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
39-39: Loop control variable temp_file
not used within loop body
Rename unused temp_file
to _temp_file
(B007)
40-40: Undefined name snakemake
(F821)
🔇 Additional comments (4)
bio/ngscheckmate/makesnvpattern/wrapper.py (4)
1-11
: LGTM: File header and imports are well-structured.
The file includes proper encoding declaration, complete author metadata, and all necessary imports.
39-42
: Fix the unused loop variable warning.
The loop control variable temp_file
is used in the shell command, but static analysis doesn't recognize this. We can safely ignore this warning.
🧰 Tools
🪛 Ruff (0.8.0)
39-39: Loop control variable temp_file
not used within loop body
Rename unused temp_file
to _temp_file
(B007)
40-40: Undefined name snakemake
(F821)
18-24
: 🛠️ Refactor suggestion
Add error handling for the makesnvpattern.pl command.
While the command execution is straightforward, consider adding error handling.
with TemporaryDirectory() as tempdir:
shell(
"makesnvpattern.pl "
"{snakemake.input.bed} "
"{snakemake.input.fasta} "
- "{index} {tempdir} snake_out {log}"
+ "{index} {tempdir} snake_out || {{ echo 'makesnvpattern.pl failed' >&2; exit 1; }} {log}"
)
Likely invalid or redundant comment.
12-16
: Verify the index generation logic.
The use of commonprefix
to generate the index is clever, but we should ensure it always produces a valid index name.
Let's verify the index generation with different bowtie file patterns:
✅ Verification successful
Let me analyze the bowtie index file patterns we found:
- We see two main patterns:
- Simple:
genome.{1,2,3,4,rev.1,rev.2}.bt2
- Complex:
BS_{CT,GA}.{1,2,3,4,rev.1,rev.2}.bt2
- Simple:
Let's verify the actual code that uses commonprefix to ensure it handles these patterns correctly:
Based on the analysis of both the bowtie index patterns and the code implementation, I can now generate the final response:
The index generation using commonprefix is safe and reliable
The commonprefix
approach correctly handles both simple and complex bowtie index patterns:
- For files like
genome.{1,2,3,4}.bt2
, it will returngenome
- For files like
BS_CT.{1,2,3,4}.bt2
, it will returnBS_CT
The .rstrip(".")
ensures clean removal of any trailing dots, making the index generation robust across different bowtie index naming patterns found in the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential edge cases in bowtie index files
# that might affect the commonprefix logic
# Search for bowtie index files and their patterns
fd -e bt2 | sort | while read -r file; do
dirname=$(dirname "$file")
filename=$(basename "$file")
echo "File: $file"
echo "Base name without extension: ${filename%.*}"
echo "---"
done
Length of output: 11843
Script:
#!/bin/bash
# Check the implementation that uses commonprefix
rg -A 5 "commonprefix" bio/ngscheckmate/makesnvpattern/wrapper.py
Length of output: 402
🧰 Tools
🪛 Ruff (0.8.0)
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
bio/ngscheckmate/makesnvpattern/wrapper.py (2)
18-24
: Add error handling for the makesnvpattern.pl commandWhile the command execution is straightforward, consider adding error handling to provide more informative error messages.
Consider adding error handling:
shell( - "makesnvpattern.pl " - "{snakemake.input.bed} " - "{snakemake.input.fasta} " - "{index} {tempdir} snake_out {log}" + "if ! makesnvpattern.pl " + "{snakemake.input.bed} " + "{snakemake.input.fasta} " + "{index} {tempdir} snake_out {log}; then " + "echo 'makesnvpattern.pl failed to generate pattern files' >&2; " + "exit 1; " + "fi" )
26-42
: Add validation for temporary files before movingWhile the output handling is well-structured, it's important to verify that the temporary files exist before attempting to move them.
Consider adding existence checks:
for output_key, temp_file in output_mapping.items(): output_path = snakemake.output.get(output_key) if output_path: + # Remove quotes from temp_file for existence check + temp_file_path = temp_file.strip("'") + if not os.path.exists(temp_file_path): + raise FileNotFoundError( + f"Expected output file not found: {temp_file_path}. " + "This might indicate a failure in makesnvpattern.pl" + ) shell(f"mv --verbose {temp_file:q} {output_path:q} {log}")🧰 Tools
🪛 Ruff (0.8.0)
40-40: Undefined name
snakemake
(F821)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
bio/ngscheckmate/makesnvpattern/wrapper.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/ngscheckmate/makesnvpattern/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
📓 Learnings (1)
bio/ngscheckmate/makesnvpattern/wrapper.py (1)
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3499
File: bio/ngscheckmate/makesnvpattern/wrapper.py:18-24
Timestamp: 2024-11-26T09:08:06.041Z
Learning: In Snakemake wrappers, input file validation is managed by Snakemake, so manual validation in the wrapper code is unnecessary.
🪛 Ruff (0.8.0)
bio/ngscheckmate/makesnvpattern/wrapper.py
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
40-40: Undefined name snakemake
(F821)
🔇 Additional comments (2)
bio/ngscheckmate/makesnvpattern/wrapper.py (2)
1-11
: LGTM: File header and imports are well-structured
The file includes appropriate metadata and necessary imports for temporary directory management and path operations.
12-16
: 🛠️ Refactor suggestion
Add validation for the index generation
The commonprefix
operation could return an empty string or an incomplete prefix if the bowtie input files don't share a meaningful common prefix.
Let's verify the bowtie input patterns:
Consider adding validation:
-index = commonprefix(snakemake.input.bowtie).rstrip(".")
+index = commonprefix(snakemake.input.bowtie).rstrip(".")
+if not index:
+ raise ValueError("Could not determine index from bowtie inputs. Files must share a common prefix.")
🧰 Tools
🪛 Ruff (0.8.0)
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
bio/ngscheckmate/makesnvpattern/wrapper.py (1)
39-42
: Fix unused loop variableThe
temp_file
variable in the loop is not used directly in the loop body as it's referenced through string formatting. Consider renaming it to_temp_file
to indicate it's unused.- for output_key, temp_file in output_mapping.items(): + for output_key, _temp_file in output_mapping.items(): output_path = snakemake.output.get(output_key) if output_path: - shell("mv --verbose {temp_file:q} {output_path:q} {log}") + shell("mv --verbose {_temp_file:q} {output_path:q} {log}")🧰 Tools
🪛 Ruff (0.8.0)
39-39: Loop control variable
temp_file
not used within loop bodyRename unused
temp_file
to_temp_file
(B007)
40-40: Undefined name
snakemake
(F821)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
bio/ngscheckmate/makesnvpattern/wrapper.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/ngscheckmate/makesnvpattern/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
📓 Learnings (1)
bio/ngscheckmate/makesnvpattern/wrapper.py (1)
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3499
File: bio/ngscheckmate/makesnvpattern/wrapper.py:18-24
Timestamp: 2024-11-26T09:08:06.041Z
Learning: In Snakemake wrappers, input file validation is managed by Snakemake, so manual validation in the wrapper code is unnecessary.
🪛 Ruff (0.8.0)
bio/ngscheckmate/makesnvpattern/wrapper.py
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
39-39: Loop control variable temp_file
not used within loop body
Rename unused temp_file
to _temp_file
(B007)
40-40: Undefined name snakemake
(F821)
🔇 Additional comments (2)
bio/ngscheckmate/makesnvpattern/wrapper.py (2)
1-11
: LGTM: Well-documented header with appropriate imports
The file header is properly documented with all necessary metadata, and the imports are appropriate for the wrapper's functionality.
16-16
: Add validation for the bowtie index prefix
The commonprefix
function might return unexpected results if the bowtie input files don't share a common prefix. Consider adding validation to ensure the index is correctly derived.
🧰 Tools
🪛 Ruff (0.8.0)
16-16: Undefined name snakemake
(F821)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (6)
bio/ngscheckmate/makesnvpattern/wrapper.py (2)
17-22
: Add proper quoting for file paths in the shell commandTo handle paths with spaces or special characters, consider using the
:q
modifier for path variables.shell( "makesnvpattern.pl " - "{snakemake.input.bed} " - "{snakemake.input.fasta} " - "{index} {tempdir} snake_out {log}" + "{snakemake.input.bed:q} " + "{snakemake.input.fasta:q} " + "{index:q} {tempdir:q} snake_out {log}" )
25-29
: Maintain consistency in output key namingThe output keys use different naming conventions:
fasta
andpattern
are simple namestxt_uncompressed
uses an underscore and is more descriptiveConsider using consistent naming:
output_mapping = { "fasta": f"{tempdir}/snake_out.fasta", "pattern": f"{tempdir}/snake_out.pt", - "txt_uncompressed": f"{tempdir}/snake_out.uniq.txt.sorted", + "txt": f"{tempdir}/snake_out.uniq.txt.sorted", }bio/ngscheckmate/makesnvpattern/meta.yaml (4)
1-3
: Enhance the tool description for better clarity.The current description is too brief. Consider expanding it to include:
- The purpose of SNP pattern files in NGSCheckMate
- How this fits into the NGSCheckMate workflow
- Common use cases or scenarios
Example improvement:
-description: Generate SNP pattern file +description: | + Generate SNP pattern files used by NGSCheckMate for sample identity verification. + This tool processes genomic regions to identify informative SNPs and creates + pattern files that serve as reference data for detecting sample swaps and + cross-contamination in high-throughput sequencing data.
6-9
: Add more specific details to input parameters.The input parameters would benefit from additional details about formats and requirements:
Example improvement:
input: - - bed: Path to bed intervals - - fasta: Path to fasta genome sequence - - index: List of paths to `bowtie` index files + - bed: Path to BED format file containing genomic intervals for SNP analysis + - fasta: Path to reference genome in FASTA format + - index: List of paths to Bowtie1 index files (.1.ebwt, .2.ebwt, etc.)
10-13
: Restructure output descriptions for better clarity.The output section would benefit from a more structured description that clearly differentiates between main and intermediate outputs, their formats, and relationships.
Example improvement:
output: - - fasta: Path to fasta-formatted regions extracted from bed intervals. Unique numeric names are given to each region. - - pattern_uncompressed: Path to uncompressed patterns, used for internal patterns checks only. Col1 = Sequence, Col2 = reference max count, Col3 = variant max count. - - pattern: Path to compressed (binary) pattern file. Main output file. + - pattern: | + Path to the main output file (binary format). + This compressed pattern file is used by NGSCheckMate for sample verification. + - pattern_uncompressed: | + Path to tab-separated text file containing intermediate results: + - Column 1: Sequence + - Column 2: Reference allele max count + - Column 3: Variant allele max count + Note: This file is for internal pattern verification only. + - fasta: | + Path to intermediate FASTA file containing extracted regions. + Each region from the input BED file is assigned a unique numeric identifier.
2-2
: Make documentation link more specific.While the URL points to the readme, it would be more helpful to link directly to the specific section about the pattern generator tool.
-url: https://github.com/parklab/NGSCheckMate?tab=readme-ov-file#1-patterngenerator +url: https://github.com/parklab/NGSCheckMate/tree/master#1-patterngenerator
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
bio/ngscheckmate/makesnvpattern/meta.yaml
(1 hunks)bio/ngscheckmate/makesnvpattern/test/Snakefile
(1 hunks)bio/ngscheckmate/makesnvpattern/wrapper.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- bio/ngscheckmate/makesnvpattern/test/Snakefile
🧰 Additional context used
📓 Path-based instructions (1)
bio/ngscheckmate/makesnvpattern/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
📓 Learnings (1)
bio/ngscheckmate/makesnvpattern/wrapper.py (2)
Learnt from: fgvieira
PR: snakemake/snakemake-wrappers#3499
File: bio/ngscheckmate/makesnvpattern/wrapper.py:18-23
Timestamp: 2024-12-02T15:10:17.279Z
Learning: In the `ngscheckmate` `makesnvpattern` wrapper, there are no extra parameters, so the `extra` parameter is unnecessary.
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3499
File: bio/ngscheckmate/makesnvpattern/wrapper.py:18-24
Timestamp: 2024-11-26T09:08:06.041Z
Learning: In Snakemake wrappers, input file validation is managed by Snakemake, so manual validation in the wrapper code is unnecessary.
🪛 Ruff (0.8.0)
bio/ngscheckmate/makesnvpattern/wrapper.py
12-12: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
31-31: Loop control variable temp_file
not used within loop body
Rename unused temp_file
to _temp_file
(B007)
32-32: Undefined name snakemake
(F821)
🔇 Additional comments (3)
bio/ngscheckmate/makesnvpattern/wrapper.py (2)
1-11
: LGTM! Well-structured file header with complete metadata.
The file header follows best practices with appropriate imports and complete metadata information.
14-14
: Consider adding validation for the index derivation
While using commonprefix
works, it might produce unexpected results if the bowtie input files don't share a meaningful common prefix.
Let's verify the bowtie input files pattern:
🧰 Tools
🪛 Ruff (0.8.0)
14-14: Undefined name snakemake
(F821)
bio/ngscheckmate/makesnvpattern/meta.yaml (1)
14-15
: Consider making thread count configurable.
The hard-coded thread count of 4 for bowtie execution could lead to suboptimal performance on different systems. Consider making this configurable through Snakemake's threading directive.
Let's verify if this is a limitation of NGSCheckMate or if it can be modified:
🤖 I have created a release \*beep\* \*boop\* --- ## [5.4.0](https://www.github.com/snakemake/snakemake-wrappers/compare/v5.3.0...v5.4.0) (2024-12-06) ### Features * NGS-bits SampleAncestry ([#3502](https://www.github.com/snakemake/snakemake-wrappers/issues/3502)) ([8600d44](https://www.github.com/snakemake/snakemake-wrappers/commit/8600d44e79ae4dafa181d3b06eed6e3db0c7a2df)) * NGS-bits SampleSimilarity ([#3500](https://www.github.com/snakemake/snakemake-wrappers/issues/3500)) ([710597c](https://www.github.com/snakemake/snakemake-wrappers/commit/710597cdc4e7f518d1fda2ec246bb6a7e0e29ba9)) * NGSCheckMate make pattern ([#3499](https://www.github.com/snakemake/snakemake-wrappers/issues/3499)) ([3b96cc1](https://www.github.com/snakemake/snakemake-wrappers/commit/3b96cc18b5a7ce8643fc3f8a492333d8f339e4c5)) * Sex.DetERRmine ([#3497](https://www.github.com/snakemake/snakemake-wrappers/issues/3497)) ([3919f2e](https://www.github.com/snakemake/snakemake-wrappers/commit/3919f2e4b6fae381cf92c921e6f17086819de345)) ### Performance Improvements * autobump bio/bbtools ([#3507](https://www.github.com/snakemake/snakemake-wrappers/issues/3507)) ([19d027d](https://www.github.com/snakemake/snakemake-wrappers/commit/19d027d176bfa5da7ec2d75b9222547b0fd2b919)) * autobump bio/busco ([#3506](https://www.github.com/snakemake/snakemake-wrappers/issues/3506)) ([aad4b56](https://www.github.com/snakemake/snakemake-wrappers/commit/aad4b56df0ca427a0a12dba67bcf0edef51d545b)) * autobump bio/busco ([#3519](https://www.github.com/snakemake/snakemake-wrappers/issues/3519)) ([6af2e11](https://www.github.com/snakemake/snakemake-wrappers/commit/6af2e11535a6407a0a60eaa18f7e15548e4a4a01)) * autobump bio/encode_fastq_downloader ([#3521](https://www.github.com/snakemake/snakemake-wrappers/issues/3521)) ([cbf06d2](https://www.github.com/snakemake/snakemake-wrappers/commit/cbf06d227d2cece579b1bdff413d09036e2d976f)) * autobump bio/freebayes ([#3509](https://www.github.com/snakemake/snakemake-wrappers/issues/3509)) ([12b8b3c](https://www.github.com/snakemake/snakemake-wrappers/commit/12b8b3ce9d5be65a2165b4ca3e0403935b950237)) * autobump bio/gatk3/baserecalibrator ([#3523](https://www.github.com/snakemake/snakemake-wrappers/issues/3523)) ([7a7518e](https://www.github.com/snakemake/snakemake-wrappers/commit/7a7518e63e0c6eac8d7cb935808b8692e4e688ff)) * autobump bio/gatk3/indelrealigner ([#3525](https://www.github.com/snakemake/snakemake-wrappers/issues/3525)) ([a0d913c](https://www.github.com/snakemake/snakemake-wrappers/commit/a0d913ce81ceb45aeff0eeee8e1e92e63bda786c)) * autobump bio/gatk3/printreads ([#3524](https://www.github.com/snakemake/snakemake-wrappers/issues/3524)) ([67af9a6](https://www.github.com/snakemake/snakemake-wrappers/commit/67af9a6899872a5e2e8cabc58572aa31d51a43cc)) * autobump bio/gatk3/realignertargetcreator ([#3522](https://www.github.com/snakemake/snakemake-wrappers/issues/3522)) ([5f8ffe7](https://www.github.com/snakemake/snakemake-wrappers/commit/5f8ffe7349ab24d55d45e1811519b2b9e9985068)) * autobump bio/hifiasm ([#3510](https://www.github.com/snakemake/snakemake-wrappers/issues/3510)) ([2b1b9f2](https://www.github.com/snakemake/snakemake-wrappers/commit/2b1b9f265231a3f3bc121dd9d4f34111b15d4486)) * autobump bio/mapdamage2 ([#3526](https://www.github.com/snakemake/snakemake-wrappers/issues/3526)) ([92da252](https://www.github.com/snakemake/snakemake-wrappers/commit/92da252bcafd05a0187b635b1938593aa4268c3b)) * autobump bio/mosdepth ([#3511](https://www.github.com/snakemake/snakemake-wrappers/issues/3511)) ([762b273](https://www.github.com/snakemake/snakemake-wrappers/commit/762b273800120519ffd2bc2f670ae93be6187cac)) * autobump bio/mtnucratio ([#3512](https://www.github.com/snakemake/snakemake-wrappers/issues/3512)) ([7f6a3b0](https://www.github.com/snakemake/snakemake-wrappers/commit/7f6a3b07cc2bae16fb30c097e783b70e87ad58f1)) * autobump bio/ngsbits/sampleancestry ([#3527](https://www.github.com/snakemake/snakemake-wrappers/issues/3527)) ([2abf38c](https://www.github.com/snakemake/snakemake-wrappers/commit/2abf38c30e541dd45563ee7ec3959ccf43802fab)) * autobump bio/ngsbits/samplesimilarity ([#3529](https://www.github.com/snakemake/snakemake-wrappers/issues/3529)) ([c91ce10](https://www.github.com/snakemake/snakemake-wrappers/commit/c91ce1075792f231d833a89298e88353c96973b1)) * autobump bio/ngscheckmate/makesnvpattern ([#3528](https://www.github.com/snakemake/snakemake-wrappers/issues/3528)) ([ff9a81d](https://www.github.com/snakemake/snakemake-wrappers/commit/ff9a81d923b8efc807f0054400d03c6277777cb8)) * autobump bio/reference/ensembl-mysql-table ([#3513](https://www.github.com/snakemake/snakemake-wrappers/issues/3513)) ([6b5c545](https://www.github.com/snakemake/snakemake-wrappers/commit/6b5c5454e86cfd393e5fa55b86566e60ef43dd5c)) * autobump bio/sexdeterrmine ([#3514](https://www.github.com/snakemake/snakemake-wrappers/issues/3514)) ([2b18309](https://www.github.com/snakemake/snakemake-wrappers/commit/2b183092fd31225462490d43df5916e678ea5f83)) * autobump bio/spades/metaspades ([#3530](https://www.github.com/snakemake/snakemake-wrappers/issues/3530)) ([070b9b6](https://www.github.com/snakemake/snakemake-wrappers/commit/070b9b62af3cc79f7a4fe1500963892631f8d752)) * autobump bio/varlociraptor/call-variants ([#3533](https://www.github.com/snakemake/snakemake-wrappers/issues/3533)) ([8c563f1](https://www.github.com/snakemake/snakemake-wrappers/commit/8c563f18f02f3f06760f2f25ad85383f9d776b2d)) * autobump bio/varlociraptor/control-fdr ([#3532](https://www.github.com/snakemake/snakemake-wrappers/issues/3532)) ([ce7d1b0](https://www.github.com/snakemake/snakemake-wrappers/commit/ce7d1b00bdf6a856ffe5ed6bac7258ca970cde4a)) * autobump bio/varlociraptor/estimate-alignment-properties ([#3531](https://www.github.com/snakemake/snakemake-wrappers/issues/3531)) ([0b5ac04](https://www.github.com/snakemake/snakemake-wrappers/commit/0b5ac04792bcd54984ea6b0e6af41efa33fba126)) * autobump bio/varlociraptor/preprocess-variants ([#3534](https://www.github.com/snakemake/snakemake-wrappers/issues/3534)) ([56a8933](https://www.github.com/snakemake/snakemake-wrappers/commit/56a8933de936e20e0068bd1d8cb6bbea3826f655)) * autobump bio/vep/annotate ([#3515](https://www.github.com/snakemake/snakemake-wrappers/issues/3515)) ([2609900](https://www.github.com/snakemake/snakemake-wrappers/commit/26099008485dbf5a8054f3284a1892dd1245ac8a)) * autobump bio/vep/cache ([#3516](https://www.github.com/snakemake/snakemake-wrappers/issues/3516)) ([f46427c](https://www.github.com/snakemake/snakemake-wrappers/commit/f46427c7951ad1a2ccd142b74698aac3009b2c66)) * autobump bio/vep/plugins ([#3535](https://www.github.com/snakemake/snakemake-wrappers/issues/3535)) ([9a6ccc3](https://www.github.com/snakemake/snakemake-wrappers/commit/9a6ccc34ce5db38c419076d7f707321bf69357dc)) * autobump bio/vg/giraffe ([#3517](https://www.github.com/snakemake/snakemake-wrappers/issues/3517)) ([6fffdd6](https://www.github.com/snakemake/snakemake-wrappers/commit/6fffdd6caec48f099b6140dd113a295f7202fa63)) * autobump utils/csvtk ([#3508](https://www.github.com/snakemake/snakemake-wrappers/issues/3508)) ([41e8545](https://www.github.com/snakemake/snakemake-wrappers/commit/41e8545b5663c91d6b13b5acd2c126a6b42a9a92)) * autobump utils/csvtk ([#3520](https://www.github.com/snakemake/snakemake-wrappers/issues/3520)) ([d4a52e5](https://www.github.com/snakemake/snakemake-wrappers/commit/d4a52e5e900aa16cad66ebbea1b22fb27d3709ea)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Adds the companion script maktsvnpattern for fastq analysis with NGSCheckMate.
QC
snakemake-wrappers
.While the contributions guidelines are more extensive, please particularly ensure that:
test.py
was updated to call any added or updated example rules in aSnakefile
input:
andoutput:
file paths in the rules can be chosen arbitrarilyinput:
oroutput:
)tempfile.gettempdir()
points tometa.yaml
contains a link to the documentation of the respective tool or command underurl:
Summary by CodeRabbit
New Features
NGSCheckMate MakeSNVPattern
tool for generating SNP pattern files.environment.yaml
file to streamline package management.makesnvpattern.pl
script to facilitate output generation.Bug Fixes
Tests
Data Updates
variants.bed
file.genome.fasta
andgenome_bwt.4.ebwt
, for testing purposes.