-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ntStat #51897
Add ntStat #51897
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThis pull request introduces two new files related to the The second file, Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
recipes/ntstat/meta.yaml (3)
16-37
: Consider adding version constraints for dependencies.While the dependencies are well-organized, consider adding version constraints for critical dependencies to ensure reproducible builds. This is particularly important for:
- python
- matplotlib
- numpy
- scipy
Example:
run: - - python + - python >=3.7 - - matplotlib + - matplotlib >=3.0 - - numpy + - numpy >=1.16 - - scipy + - scipy >=1.0
39-42
: Consider enhancing test coverage.While the current tests verify basic installation, consider adding more comprehensive tests:
- Test other main commands besides 'count'
- Add import test for Python package
Example:
commands: - ntstat --version - ntstat count --help + - python -c "import ntstat"
44-48
: Consider adding more package metadata.The current metadata is good, but could be enhanced with:
- A more detailed description
- Documentation URL
- Dev URL for issue tracking
Example:
summary: 'a toolkit for statistical analysis of k-mer frequency and depth' + description: | + ntStat is a comprehensive toolkit for analyzing k-mer frequencies + and depth in genomic data. It provides statistical analysis tools + for understanding sequence composition and coverage patterns. + doc_url: https://github.com/bcgsc/ntStat/wiki + dev_url: https://github.com/bcgsc/ntStat/issues
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
recipes/ntstat/build.sh
(1 hunks)recipes/ntstat/meta.yaml
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- recipes/ntstat/build.sh
🧰 Additional context used
🪛 yamllint
recipes/ntstat/meta.yaml
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
🔇 Additional comments (3)
recipes/ntstat/meta.yaml (3)
1-6
: LGTM! Package metadata follows best practices.
The version variable and package naming are well-structured.
🧰 Tools
🪛 yamllint
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
11-14
: LGTM! Build configuration follows best practices.
Good use of run_exports
with appropriate version pinning strategy for ABI compatibility.
7-9
: Verify source URL accessibility and checksum.
The source configuration looks correct, but let's verify the URL and checksum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
recipes/ntstat/meta.yaml (3)
16-37
: Consider adding version constraints for dependencies.While btllib has a version constraint, other dependencies lack version specifications. This might lead to compatibility issues. Consider adding version constraints for:
- python
- matplotlib-base
- numpy
- scipy
- pybind11
These are critical dependencies that often have breaking changes between versions.
39-42
: Consider adding more comprehensive tests.Current tests only verify basic CLI functionality. Consider adding:
- Import tests for Python components
- Tests for other main commands besides 'count'
- Tests with example input data
Example addition:
test: commands: - ntstat --version - ntstat count --help imports: - ntstat
44-48
: Consider enhancing the package summary.The current summary is brief. Consider expanding it to better describe the package's features and use cases. For example:
"A toolkit for statistical analysis of k-mer frequency and depth in genomic sequences, providing tools for frequency distribution analysis and depth estimation"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
recipes/ntstat/meta.yaml
(1 hunks)
🧰 Additional context used
🪛 yamllint
recipes/ntstat/meta.yaml
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
🔇 Additional comments (4)
recipes/ntstat/meta.yaml (4)
1-6
: LGTM! Package metadata follows conda conventions.
🧰 Tools
🪛 yamllint
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
11-14
: LGTM! Build configuration properly handles version pinning.
The use of pin_subpackage
with max_pin="x"
ensures proper ABI compatibility as mentioned in the PR objectives.
1-48
: Verify bioconda channel suitability.
As per PR objectives, packages not directly related to biological sciences should be submitted to conda-forge instead. Please confirm that ntstat is specifically designed for biological sequence analysis.
#!/bin/bash
# Check repository documentation for biological relevance
rg -i "genom|bio|dna|rna|sequence" README.md || echo "No explicit mentions of biological terms found in README"
🧰 Tools
🪛 yamllint
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
7-9
: Verify package name case sensitivity.
The GitHub repository name (ntStat) differs in case from the package name (ntstat). While this works, it might cause confusion. Consider documenting this in the recipe or updating for consistency.
✅ Verification successful
Let me try to verify this with a different approach by checking both the package name and repository existence.
Case sensitivity difference is valid and working correctly
The verification shows that:
- The package directory exists with correct lowercase name (
recipes/ntstat/
) - The GitHub URL redirects properly (HTTP 302), indicating the repository is accessible
- The repository URL is consistently used in the recipe's metadata (both in source and homepage)
- The package works as intended with tests passing (
ntstat --version
,ntstat count --help
)
The case difference between the package name (ntstat) and repository name (ntStat) is a common practice where the conda package follows lowercase convention while preserving the original repository name in URLs. This doesn't require any changes or additional documentation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify if the GitHub repository exists and is accessible
gh repo view bcgsc/ntStat >/dev/null 2>&1
if [ $? -eq 0 ]; then
echo "Repository exists and is accessible"
else
echo "Repository might have case sensitivity issues or access problems"
fi
Length of output: 2075
Script:
#!/bin/bash
# Check if the package directory exists and its contents
echo "=== Package Directory ==="
ls -la recipes/ntstat/
# Check the actual URL accessibility
echo -e "\n=== URL Check ==="
curl -sI "https://github.com/bcgsc/ntStat/archive/refs/tags/v1.0.0.tar.gz" | head -n 1
# Search for any references to this package in the repository
echo -e "\n=== Package References ==="
rg -i "ntstat|ntStat" --type yaml
Length of output: 1707
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
recipes/ntstat/meta.yaml (2)
40-43
: Consider adding more comprehensive tests.While the basic command-line tests are good, consider adding:
- Tests for other main commands besides 'count'
- Input/output validation tests
- Version number validation
Would you like assistance in adding more test cases?
45-49
: Consider enhancing the package summary.While the current summary is accurate, it could be more descriptive. Consider expanding it to include key features or use cases of the toolkit.
- summary: 'a toolkit for statistical analysis of k-mer frequency and depth' + summary: 'A comprehensive toolkit for analyzing and visualizing k-mer frequency distributions and sequencing depth patterns in genomic data'
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
recipes/ntstat/meta.yaml
(1 hunks)
🧰 Additional context used
🪛 yamllint
recipes/ntstat/meta.yaml
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
🔇 Additional comments (4)
recipes/ntstat/meta.yaml (4)
1-6
: LGTM: Package metadata is well-structured.
The version variable and package metadata are correctly defined using conda packaging conventions.
🧰 Tools
🪛 yamllint
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
11-14
: LGTM: Build settings are properly configured.
The build number and run_exports are correctly set up to maintain ABI compatibility.
16-38
: Verify btllib compatibility across dependencies.
The dependencies are well-structured with appropriate platform-specific conditionals. Let's verify btllib version compatibility between host and run requirements.
✅ Verification successful
btllib version constraints are compatible across dependencies
The version constraint btllib >=1.7.2
used in both host and run requirements of ntstat is consistent and compatible with other recipes in the codebase:
- It's the highest minimum version among all recipes
- Other recipes use lower minimum versions (1.7.1, 1.6.2, 1.5.0)
- The constraint is consistently applied in both host and run sections
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if btllib version constraint is consistent in other recipes
# Expected: Find any potential version conflicts or inconsistencies
# Search for btllib version constraints in other recipes
echo "Searching for btllib version constraints in other recipes:"
rg "btllib.*[0-9]" recipes/
Length of output: 620
7-9
: Verify source archive accessibility.
The source URL and checksum are properly configured. Let's verify the source archive is accessible.
✅ Verification successful
Source configuration is valid and accessible
The source URL is properly configured and the tag v1.0.0
exists in the repository. While we see a 302 redirect, this is normal for GitHub's archive downloads. The tag verification confirms the source is available.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the source archive is accessible and matches the checksum
# Expected: HTTP 200 status code and matching SHA256
URL="https://github.com/bcgsc/ntStat/archive/refs/tags/v1.0.0.tar.gz"
EXPECTED_SHA="262dc58ab6f781a6816a5d4fc129659c7fa2c3364871320eafb45bfa822fab6a"
# Check if URL is accessible
curl -sI "$URL" | head -n1
# Note: We don't download and verify SHA256 as it would consume unnecessary bandwidth
Length of output: 247
Script:
#!/bin/bash
# Follow the redirect to check the final status
curl -sLI "https://github.com/bcgsc/ntStat/archive/refs/tags/v1.0.0.tar.gz" | head -n1
# Also verify if the tag exists in the repository
gh api repos/bcgsc/ntStat/git/refs/tags/v1.0.0 --jq '.ref'
Length of output: 175
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
recipes/ntstat/meta.yaml (2)
40-43
: Consider enhancing test coverageWhile the current tests verify basic installation, consider adding:
- Tests for other subcommands if available
- Basic functionality tests with sample data
45-49
: Consider enhancing the package summaryThe current summary is brief. Consider expanding it to include:
- The specific types of statistical analyses provided
- The intended use cases or target audience
- Key features that distinguish it from similar tools
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
recipes/ntstat/build.sh
(1 hunks)recipes/ntstat/meta.yaml
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- recipes/ntstat/build.sh
🧰 Additional context used
🪛 yamllint
recipes/ntstat/meta.yaml
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
🔇 Additional comments (4)
recipes/ntstat/meta.yaml (4)
11-14
: LGTM! Build configuration is well structured
The build number and pinning strategy are appropriate for a new package.
16-31
: Consider adding version constraints for critical dependencies
While btllib has a version constraint, other dependencies like cpp-argparse, cpp-tabulate, and pybind11 might benefit from version pinning to ensure build reproducibility.
#!/bin/bash
# Check if these dependencies have version constraints in other recipes
rg -l "cpp-argparse.*>=" recipes/
rg -l "cpp-tabulate.*>=" recipes/
rg -l "pybind11.*>=" recipes/
3-9
: Verify package naming consistency
The package name in the recipe is lowercase ntstat
, but the GitHub repository uses ntStat
. While this works, it's recommended to maintain consistent naming across the ecosystem to avoid confusion.
#!/bin/bash
# Verify if the package name is consistently used in documentation and scripts
rg -i "ntstat|ntStat" README.md || echo "README.md not found or package name not mentioned"
23-23
: Verify the necessity of meson-python
The build.sh script mentioned in the summary uses meson directly. Please verify if meson-python is required for the build process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
recipes/ntstat/meta.yaml (2)
16-23
: Consider removing duplicate Python requirement.Python appears in both build and host sections. For Python packages, it's typically only needed in host section unless specifically required during the build process.
build: - {{ compiler('cxx') }} - meson - meson-python - cmake - - python
41-44
: Consider adding more comprehensive tests.While the current tests verify basic functionality, consider adding:
- Test for actual k-mer analysis functionality
- Input/output file handling tests
- Error case handling tests
test: commands: - ntstat --version - ntstat count --help + - ntstat count --help | grep "input" # Verify expected options + - ntstat count --help | grep "output" + # Add more functional tests if possible
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
recipes/ntstat/meta.yaml
(1 hunks)
🧰 Additional context used
🪛 yamllint
recipes/ntstat/meta.yaml
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
🔇 Additional comments (5)
recipes/ntstat/meta.yaml (5)
1-6
: LGTM! Package metadata is well-structured.
The package name and version are properly defined using Conda conventions.
🧰 Tools
🪛 yamllint
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
11-14
: LGTM! Build configuration follows best practices.
The run_exports section correctly pins the package version, allowing updates within the major version.
46-50
: LGTM! Package metadata is complete and accurate.
The about section provides all necessary information and follows conda-forge guidelines.
7-9
: Verify source URL accessibility and checksum.
The source configuration looks correct, but let's verify the URL and checksum.
24-39
: Verify dependency versions and availability.
The dependencies look appropriate, but let's verify their availability in the bioconda channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
recipes/ntstat/meta.yaml (2)
41-44
: Consider adding more comprehensive test commands.While the basic smoke tests are good, consider adding:
- Tests for other main commands besides 'count'
- Input/output file handling tests
- Error case handling tests
Example additions:
test: commands: - ntstat --version - ntstat count --help - ntstat analyze --help # if this command exists - ntstat --help # test main help menu - command -v ntstat # verify installation in PATH
46-50
: Consider enhancing package metadata.The current metadata is good but could be more comprehensive. Consider adding:
- A more detailed description
- Documentation URL
- Developer information
Example additions:
about: home: https://github.com/bcgsc/ntStat license: GPL-3.0-or-later license_file: LICENSE summary: 'a toolkit for statistical analysis of k-mer frequency and depth' description: | ntStat is a comprehensive toolkit that provides statistical analysis capabilities for k-mer frequency and depth data, useful in genomics and bioinformatics workflows. doc_url: https://github.com/bcgsc/ntStat/wiki # if available dev_url: https://github.com/bcgsc/ntStat
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
recipes/ntstat/build.sh
(1 hunks)recipes/ntstat/meta.yaml
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- recipes/ntstat/build.sh
🧰 Additional context used
🪛 yamllint
recipes/ntstat/meta.yaml
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
🔇 Additional comments (4)
recipes/ntstat/meta.yaml (4)
1-6
: LGTM! Package metadata is well-structured.
The version templating and package naming follow conda conventions.
🧰 Tools
🪛 yamllint
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
11-14
: LGTM! Build configuration is appropriate.
The run_exports section correctly pins the package version, allowing updates to minor and patch versions while maintaining major version compatibility.
16-39
: Verify availability of all dependencies in conda channels.
Let's ensure all dependencies are available in the bioconda or conda-forge channels with compatible versions.
#!/bin/bash
# Description: Verify dependency availability in conda channels
# Test: Check if all dependencies exist in allowed channels
# Expected: All dependencies should be available
# Check each dependency
for pkg in btllib cpp-argparse cpp-tabulate indicators pybind11 ntcard matplotlib-base numpy scipy tabulate; do
echo "Checking $pkg..."
conda search -c conda-forge -c bioconda $pkg
done
7-9
: Verify the source repository's authenticity.
The source URL points to a GitHub repository under the bcgsc organization. Let's verify if this is the official repository for ntStat.
✅ Verification successful
Source repository verified as legitimate
The source is verified to be from the official BC Cancer Genome Sciences Centre (BCGSC) GitHub organization:
- Repository contains comprehensive documentation and is actively maintained
- BCGSC is a legitimate research organization (created in 2012, 116 public repos)
- Release tarball exists at the specified URL
- Source code is properly licensed under GPL v3
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the authenticity of the source repository
# Test: Check if the repository exists and is maintained by BCGSC
# Expected: Repository should exist and show signs of active maintenance
# Check repository information
gh repo view bcgsc/ntStat --json description,isArchived,defaultBranchRef,createdAt
# Check latest release to verify the version
gh release view --repo bcgsc/ntStat v1.0.0 --json tagName,name,publishedAt
Length of output: 3003
Script:
#!/bin/bash
# Let's try a different approach to verify the repository and source
# 1. Check if README exists and contains relevant information
curl -s https://raw.githubusercontent.com/bcgsc/ntStat/main/README.md
# 2. Verify the specific release tarball exists
curl -sI https://github.com/bcgsc/ntStat/releases/download/v1.0.0/ntStat-1.0.0.tar.gz
# 3. Check if the organization is legitimate by looking at other repos
curl -s https://api.github.com/users/bcgsc
Length of output: 19103
@BiocondaBot please add label |
- meson | ||
- meson-python | ||
- cmake | ||
- python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding python
to the build requirements fixed a version mismatch issue where my Python modules compiled from C++ were being copied to python3.13/site-packages
during build and imported from python3.10
during run (and test). I could not find an alternative fix.
Logs:
Describe your pull request here
Please read the guidelines for Bioconda recipes before opening a pull request (PR).
General instructions
@BiocondaBot please add label
command.@bioconda/core
in a comment.Instructions for avoiding API, ABI, and CLI breakage issues
Conda is able to record and lock (a.k.a. pin) dependency versions used at build time of other recipes.
This way, one can avoid that expectations of a downstream recipe with regards to API, ABI, or CLI are violated by later changes in the recipe.
If not already present in the meta.yaml, make sure to specify
run_exports
(see here for the rationale and comprehensive explanation).Add a
run_exports
section like this:with
...
being one of:{{ pin_subpackage("myrecipe", max_pin="x") }}
{{ pin_subpackage("myrecipe", max_pin="x.x") }}
{{ pin_subpackage("myrecipe", max_pin="x.x") }}
(in such a case, please add a note that shortly mentions your evidence for that){{ pin_subpackage("myrecipe", max_pin="x.x.x") }}
(in such a case, please add a note that shortly mentions your evidence for that){{ pin_subpackage("myrecipe", max_pin=None) }}
while replacing
"myrecipe"
with eithername
if aname|lower
variable is defined in your recipe or with the lowercase name of the package in quotes.Bot commands for PR management
Please use the following BiocondaBot commands:
Everyone has access to the following BiocondaBot commands, which can be given in a comment:
@BiocondaBot please update
@BiocondaBot please add label
please review & merge
label.@BiocondaBot please fetch artifacts
You can use this to test packages locally.
Note that the
@BiocondaBot please merge
command is now depreciated. Please just squash and merge instead.Also, the bot watches for comments from non-members that include
@bioconda/<team>
and will automatically re-post them to notify the addressed<team>
.