-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for bgzip output #3267
Conversation
Warning Rate limit exceeded@fgvieira has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 5 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThe pull request introduces modifications to the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Outside diff range and nitpick comments (1)
bio/seqkit/environment.linux-64.pin.txt (1)
Line range hint
1-4
: Suggestion: Add documentation for the environment fileThe environment file is well-structured with explicit package URLs, which is excellent for reproducibility. To further improve its usability, consider adding more detailed documentation at the beginning of the file. This could include:
- A brief description of the purpose of this environment (e.g., "Environment for running SeqKit with all necessary dependencies").
- Instructions on how to create and activate the environment using this file.
- A note about the importance of using these exact package versions for reproducibility.
- Information on how to update this file if needed in the future.
This additional documentation will help users understand the environment's purpose and how to use it effectively.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- bio/seqkit/environment.linux-64.pin.txt (1 hunks)
- bio/seqkit/environment.yaml (1 hunks)
- bio/seqkit/test/Snakefile (1 hunks)
- bio/seqkit/wrapper.py (2 hunks)
Additional context used
Path-based instructions (1)
bio/seqkit/wrapper.py (2)
Pattern
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
Pattern
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.
Ruff
bio/seqkit/wrapper.py
8-8: Undefined name
snakemake
(F821)
9-9: Undefined name
snakemake
(F821)
41-41: Undefined name
snakemake
(F821)
41-41: Undefined name
snakemake
(F821)
42-42: Undefined name
snakemake
(F821)
44-44: Undefined name
snakemake
(F821)
Additional comments not posted (10)
bio/seqkit/environment.yaml (3)
2-2
: LGTM: Addition of conda-forge channelThe addition of the
conda-forge
channel is appropriate. It's a reputable source for many bioinformatics tools and dependencies. The placement beforebioconda
in the channel list is correct, as it follows the recommended channel priority order.
7-7
: LGTM: Addition of htslib dependencyThe inclusion of
htslib =1.21
as a dependency is aligned with the PR objective of adding support for bgzip output. htslib includes bgzip, which will enable the desired functionality. The specific version (1.21) is specified, which is good for ensuring reproducibility of the environment.
Line range hint
1-8
: Verify completeness of the PRThe changes in this file align well with the PR objective of adding support for bgzip output. However, the PR description mentions updates to
test.py
and potentially other files, which are not included in this review.Please ensure that all necessary changes have been made and included in the PR. Run the following script to check for other modified files:
If there are additional files that should be part of this PR, please include them for review.
bio/seqkit/environment.linux-64.pin.txt (2)
5-5
: LGTM: SeqKit package updateThe SeqKit package has been updated from build 0 to build 1 (2.8.2-h9ee0642_0 to 2.8.2-h9ee0642_1). This update likely includes minor improvements or bug fixes. It's good practice to use the latest build of a package.
6-29
: LGTM: Addition of necessary dependenciesThe environment file has been updated with several new dependencies from conda-forge and bioconda channels. These additions, including essential libraries like libgcc, openssl, bzip2, libcurl, and htslib, provide a more comprehensive and robust environment for SeqKit. This update ensures that all necessary components are explicitly specified, which is a good practice for reproducibility.
bio/seqkit/test/Snakefile (1)
189-189
: LGTM! Verify wrapper compatibility and update documentation.The addition of
out_bgzip=True
to theparams
section of theseqkit_concat
rule aligns with the PR objective to add support for bgzip output. This change is minimal and doesn't affect the existing functionality.To ensure full compatibility, please verify the following:
- The wrapper script (
master/bio/seqkit
) correctly handles the newout_bgzip
parameter.- Update the wrapper documentation to reflect this new option.
You can use the following script to check the wrapper content:
If the wrapper doesn't handle the new parameter or the documentation is missing, please make the necessary updates.
bio/seqkit/wrapper.py (4)
6-6
: ImportingPath
is appropriate for file suffix checking.The addition of
from pathlib import Path
is necessary for handling file suffix checks later in the code.
9-9
: Logging configuration updated suitably.Setting
stdout=False
andstderr=True
insnakemake.log_fmt_shell
ensures that only stderr is captured in the log. This is appropriate since stdout is being redirected to output files.Tools
Ruff
9-9: Undefined name
snakemake
(F821)
44-54
: Command construction with bgzip output is correct.The shell command is appropriately constructed to handle bgzip compression when
out_bgzip
is set. Piping the output tobgzip
and redirecting to{snakemake.output[0]}
is the correct approach.Tools
Ruff
44-44: Undefined name
snakemake
(F821)
56-65
: Command construction without bgzip output is correct.When
out_bgzip
is not set, the shell command correctly uses the--out-file
option to specify the output file, maintaining the original functionality.
@fgvieira sorry, Filipe, I am currently in a block course and not able to review until next Monday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- bio/seqkit/wrapper.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/seqkit/wrapper.py (2)
Pattern
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
Pattern
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.
🪛 Ruff
bio/seqkit/wrapper.py
9-9: Undefined name
snakemake
(F821)
40-40: Undefined name
snakemake
(F821)
🔇 Additional comments (1)
bio/seqkit/wrapper.py (1)
9-9
: Verify that suppressing stdout logging is intendedBy setting
stdout=False
insnakemake.log_fmt_shell
, standard output from the shell command will not be captured inlog
. Ensure that this change aligns with your logging requirements and that no important information is missed.🧰 Tools
🪛 Ruff
9-9: Undefined name
snakemake
(F821)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still unable to test, but the changes look trivial and ok.
Furthermore: I think, if possible we should have more bzip2-support, because bzip2 is likely to compress ordinary sequence files better than gzip.
So: +1
|
Sorry, I confused the two. No matter: any additional compression format is highly appreciated. Shall we wait for the 2nd review? David's mailbox just generated an out-of-office reply. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- bio/seqkit/wrapper.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/seqkit/wrapper.py (2)
Pattern
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
Pattern
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.
🪛 Ruff
bio/seqkit/wrapper.py
9-9: Undefined name
snakemake
(F821)
40-40: Undefined name
snakemake
(F821)
🔇 Additional comments (2)
bio/seqkit/wrapper.py (2)
9-9
: Verify the intention behind changing stdout logging.The logging configuration has been modified to set
stdout=False
. This means that standard output will not be captured in the log file. Please confirm if this is intentional and doesn't negatively impact debugging or monitoring capabilities.🧰 Tools
🪛 Ruff
9-9: Undefined name
snakemake
(F821)
46-52
: LGTM: Shell command construction looks good.The changes to the shell command construction look good. The command is now properly grouped using parentheses, and the
{input}
variable is correctly used to reflect the changes made for bgzip support. This implementation aligns well with the PR objective of adding support for bgzip output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- bio/seqkit/wrapper.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/seqkit/wrapper.py (2)
Pattern
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
Pattern
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.
🪛 Ruff
bio/seqkit/wrapper.py
9-9: Undefined name
snakemake
(F821)
40-40: Undefined name
snakemake
(F821)
41-41: Undefined name
snakemake
(F821)
41-41: Undefined name
snakemake
(F821)
43-43: Undefined name
snakemake
(F821)
🔇 Additional comments (1)
bio/seqkit/wrapper.py (1)
9-9
: Verify the intention behind modifying log configuration.The log configuration has been changed to set
stdout=False
. This modification may affect error tracking and debugging. Could you please confirm if this change is intentional and explain the reasoning behind it?🧰 Tools
🪛 Ruff
9-9: Undefined name
snakemake
(F821)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
bio/seqkit/wrapper.py (1)
Line range hint
1-1
: Consider adding an import forPath
.The static analysis tool Ruff has flagged
Path
as an undefined name. While thesnakemake
warnings are false positives (assnakemake
is implicitly available in Snakemake wrappers), thePath
warning might be valid.Consider adding the following import at the beginning of the file:
from pathlib import PathThis will resolve the undefined name warning and make the usage of
Path
explicit.🧰 Tools
🪛 Ruff
8-8: Undefined name
snakemake
(F821)
9-9: Undefined name
snakemake
(F821)
12-12: Undefined name
snakemake
(F821)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- bio/seqkit/wrapper.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/seqkit/wrapper.py (2)
Pattern
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
Pattern
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.
🪛 Ruff
bio/seqkit/wrapper.py
9-9: Undefined name
snakemake
(F821)
40-40: Undefined name
snakemake
(F821)
41-41: Undefined name
Path
(F821)
41-41: Undefined name
snakemake
(F821)
42-42: Undefined name
snakemake
(F821)
42-42: Undefined name
snakemake
(F821)
44-44: Undefined name
snakemake
(F821)
🔇 Additional comments (4)
bio/seqkit/wrapper.py (4)
9-9
: LGTM: Logging configuration updated.The modification to set
stdout=False
while keepingstderr=True
is appropriate. This change will reduce log verbosity by excluding standard output, while still capturing error messages.🧰 Tools
🪛 Ruff
9-9: Undefined name
snakemake
(F821)
40-44
: Excellent implementation of bgzip support!The addition of bgzip support is well-implemented:
- The conditional check for
out_bgzip
allows for optional use of bgzip compression.- The assertion ensures that the output file has a valid extension (.gz, .bgz, or .bgzip), preventing potential issues.
- The use of
bgzip
with threading (--threads {snakemake.threads}
) optimizes performance.- The else clause maintains backward compatibility for non-bgzip output.
This implementation effectively addresses the PR objectives and resolves concerns raised in previous review comments about file extension handling.
🧰 Tools
🪛 Ruff
40-40: Undefined name
snakemake
(F821)
41-41: Undefined name
Path
(F821)
41-41: Undefined name
snakemake
(F821)
42-42: Undefined name
snakemake
(F821)
42-42: Undefined name
snakemake
(F821)
44-44: Undefined name
snakemake
(F821)
47-53
: Well-structured shell command modifications.The changes to the shell command construction are appropriate:
- Wrapping the command in parentheses is a good practice for complex shell commands.
- Moving
{input}
to the end of the command allows for conditional piping to bgzip when needed.- These modifications maintain functionality for both bgzip and non-bgzip cases.
The restructured command effectively supports the new bgzip feature while preserving existing functionality.
Line range hint
1-53
: Summary: Excellent implementation of bgzip supportThis PR successfully implements bgzip support for the seqkit wrapper, meeting the stated objectives. Key points:
- The changes are well-structured and maintain backward compatibility.
- Edge cases, such as file extensions, are properly handled.
- The implementation addresses concerns raised in previous review comments.
- Code readability and best practices are maintained.
There's a minor suggestion to add an import for
Path
, but overall, this is a high-quality contribution that enhances the functionality of the Snakemake wrapper for seqkit.🧰 Tools
🪛 Ruff
40-40: Undefined name
snakemake
(F821)
41-41: Undefined name
Path
(F821)
41-41: Undefined name
snakemake
(F821)
42-42: Undefined name
snakemake
(F821)
42-42: Undefined name
snakemake
(F821)
44-44: Undefined name
snakemake
(F821)
🤖 I have created a release \*beep\* \*boop\* --- ## [4.7.0](https://www.github.com/snakemake/snakemake-wrappers/compare/v4.6.0...v4.7.0) (2024-10-04) ### Features * Add support for bgzip output ([#3267](https://www.github.com/snakemake/snakemake-wrappers/issues/3267)) ([36a2b15](https://www.github.com/snakemake/snakemake-wrappers/commit/36a2b15ece9fe03d1c648c575e7828be99a10347)) ### Bug Fixes * Gatk filter mutect call typo ([#3295](https://www.github.com/snakemake/snakemake-wrappers/issues/3295)) ([6295e37](https://www.github.com/snakemake/snakemake-wrappers/commit/6295e372ed45d7827e9ad7aae6ba27826b59c5cc)) * Gatk learn orientation model with multiple input files ([#3296](https://www.github.com/snakemake/snakemake-wrappers/issues/3296)) ([250dd3e](https://www.github.com/snakemake/snakemake-wrappers/commit/250dd3ec2709d915fb20293207f18e0c83e931a7)) ### Performance Improvements * autobump bio/bbtools ([#3272](https://www.github.com/snakemake/snakemake-wrappers/issues/3272)) ([51a5fee](https://www.github.com/snakemake/snakemake-wrappers/commit/51a5feeb4bc7c15bc6bd8442f9ba13e39e77f33b)) * autobump bio/delly ([#3274](https://www.github.com/snakemake/snakemake-wrappers/issues/3274)) ([f432006](https://www.github.com/snakemake/snakemake-wrappers/commit/f4320069f90c781308ba1cf5134c92ed91c3548f)) * autobump bio/deseq2/deseqdataset ([#3275](https://www.github.com/snakemake/snakemake-wrappers/issues/3275)) ([e311359](https://www.github.com/snakemake/snakemake-wrappers/commit/e31135953256857f1169a6e80a4336878b452659)) * autobump bio/encode_fastq_downloader ([#3276](https://www.github.com/snakemake/snakemake-wrappers/issues/3276)) ([29a80c6](https://www.github.com/snakemake/snakemake-wrappers/commit/29a80c6c2de3e343cf6fd2f0e37e16fa9f1ff11a)) * autobump bio/fastq_screen ([#3277](https://www.github.com/snakemake/snakemake-wrappers/issues/3277)) ([fead377](https://www.github.com/snakemake/snakemake-wrappers/commit/fead37750da2c1b8787a2b976375258394e1b29d)) * autobump bio/freebayes ([#3278](https://www.github.com/snakemake/snakemake-wrappers/issues/3278)) ([bf6a162](https://www.github.com/snakemake/snakemake-wrappers/commit/bf6a16248108351e111f104b82f44ddf74416502)) * autobump bio/gatk3/baserecalibrator ([#3281](https://www.github.com/snakemake/snakemake-wrappers/issues/3281)) ([9e32e0f](https://www.github.com/snakemake/snakemake-wrappers/commit/9e32e0f4ed0a4b14eaec19814ea82564f5b2e316)) * autobump bio/gatk3/indelrealigner ([#3279](https://www.github.com/snakemake/snakemake-wrappers/issues/3279)) ([26d23ed](https://www.github.com/snakemake/snakemake-wrappers/commit/26d23ed369ca461fca64368dfc7798dd5c9c82de)) * autobump bio/gatk3/printreads ([#3282](https://www.github.com/snakemake/snakemake-wrappers/issues/3282)) ([1d621b7](https://www.github.com/snakemake/snakemake-wrappers/commit/1d621b7a137885b8ea13ce056af10c9d523c335a)) * autobump bio/gatk3/realignertargetcreator ([#3280](https://www.github.com/snakemake/snakemake-wrappers/issues/3280)) ([90c78fd](https://www.github.com/snakemake/snakemake-wrappers/commit/90c78fd9f1d638823a3c241d1bebdb7e5627681c)) * autobump bio/kallisto/index ([#3091](https://www.github.com/snakemake/snakemake-wrappers/issues/3091)) ([5860f2e](https://www.github.com/snakemake/snakemake-wrappers/commit/5860f2ef20a1a8eb49cd39d5b445c0d1aa087453)) * autobump bio/mapdamage2 ([#3287](https://www.github.com/snakemake/snakemake-wrappers/issues/3287)) ([d2f8a85](https://www.github.com/snakemake/snakemake-wrappers/commit/d2f8a858337e3e5733a325bf510fceb007bf5567)) * autobump bio/meryl/count ([#3286](https://www.github.com/snakemake/snakemake-wrappers/issues/3286)) ([b3595af](https://www.github.com/snakemake/snakemake-wrappers/commit/b3595af3d5b7a20ecf5782682992d262aea29a0a)) * autobump bio/meryl/sets ([#3285](https://www.github.com/snakemake/snakemake-wrappers/issues/3285)) ([e8579fa](https://www.github.com/snakemake/snakemake-wrappers/commit/e8579fa26864c6eac2296bf56e2e28acd0daea3e)) * autobump bio/meryl/stats ([#3289](https://www.github.com/snakemake/snakemake-wrappers/issues/3289)) ([2dadb31](https://www.github.com/snakemake/snakemake-wrappers/commit/2dadb31f01cf4f03ebf4b6dd0e1b06edd05c3ae3)) * autobump bio/multiqc ([#3288](https://www.github.com/snakemake/snakemake-wrappers/issues/3288)) ([c69937c](https://www.github.com/snakemake/snakemake-wrappers/commit/c69937cb537efbdd317d56a2edbfde00de1a016c)) * autobump bio/open-cravat/module ([#3291](https://www.github.com/snakemake/snakemake-wrappers/issues/3291)) ([dae6566](https://www.github.com/snakemake/snakemake-wrappers/commit/dae6566111a48fd0f7503eef03731ccf218a53a5)) * autobump bio/open-cravat/run ([#3290](https://www.github.com/snakemake/snakemake-wrappers/issues/3290)) ([119c11b](https://www.github.com/snakemake/snakemake-wrappers/commit/119c11b6daafe085961d65b2f708bbed627aae87)) * autobump bio/spades/metaspades ([#3292](https://www.github.com/snakemake/snakemake-wrappers/issues/3292)) ([6cbbf83](https://www.github.com/snakemake/snakemake-wrappers/commit/6cbbf83eb4ed0e25a700b1729ba4f76b82d76387)) * autobump bio/vep/plugins ([#3293](https://www.github.com/snakemake/snakemake-wrappers/issues/3293)) ([ed94694](https://www.github.com/snakemake/snakemake-wrappers/commit/ed946940d0fd4c76f9481a07a458fe27bc4479e3)) * autobump bio/vsearch ([#3294](https://www.github.com/snakemake/snakemake-wrappers/issues/3294)) ([fb512aa](https://www.github.com/snakemake/snakemake-wrappers/commit/fb512aa74030f18fa61b6c50c18e4c67413a550c)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
QC
snakemake-wrappers
.While the contributions guidelines are more extensive, please particularly ensure that:
test.py
was updated to call any added or updated example rules in aSnakefile
input:
andoutput:
file paths in the rules can be chosen arbitrarilyinput:
oroutput:
)tempfile.gettempdir()
points tometa.yaml
contains a link to the documentation of the respective tool or command underurl:
Summary by CodeRabbit
New Features
bgzip
in theseqkit_concat
rule.htslib
version1.21
as a new dependency.conda-forge
channel to enhance package availability.Bug Fixes
seqkit
package URL to a newer build for improved stability and performance.Chores