-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow setting of custom alignment properties #346
Conversation
WalkthroughThe pull request introduces a new configuration feature for custom alignment properties across multiple configuration files. A new section Changes
Sequence DiagramsequenceDiagram
participant Config as Configuration
participant Common as Common Functions
participant Calling as Calling Rule
Config->>Common: Check if custom alignment activated
Common-->>Config: Return activation status
Config->>Common: Request alignment properties
Common->>Common: Retrieve properties from TSV
Common-->>Calling: Return alignment properties path
Calling->>Calling: Use dynamic alignment properties
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (7)
.test/config-sra/config.yaml (2)
117-120
: LGTM! Consider adding documentation for the new feature.The new
custom_alignment_properties
section is well-structured with sensible defaults. However, documentation about its purpose and usage would be helpful.Consider adding comments explaining:
- The purpose of this feature
- The expected format of the TSV file
- The significance of the "panel" column
🧰 Tools
🪛 yamllint (1.35.1)
[error] 120-120: no new line character at the end of file
(new-line-at-end-of-file)
115-120
: Fix formatting issues.There are two formatting issues to address:
- Remove trailing spaces on line 116
- Add a newline at the end of the file
generate_excel: true - + custom_alignment_properties: activate: false column: "panel" tsv: "config_alignment_properties.tsv" +🧰 Tools
🪛 yamllint (1.35.1)
[error] 116-116: trailing spaces
(trailing-spaces)
[error] 120-120: no new line character at the end of file
(new-line-at-end-of-file)
.test/config_primers/config.yaml (1)
120-123
: LGTM! Configuration is consistent.The
custom_alignment_properties
section matches the structure seen in other configuration files, maintaining consistency across the codebase.Add a newline at the end of the file:
activate: false column: "panel" tsv: "config_alignment_properties.tsv" +
🧰 Tools
🪛 yamllint (1.35.1)
[error] 123-123: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-giab/config.yaml (1)
161-164
: Configuration structure looks good, fix formattingThe custom_alignment_properties configuration is well-structured with sensible defaults. The
activate: false
ensures backward compatibility.Add a newline at the end of the file to fix the YAML formatting:
column: "panel" tsv: "config_alignment_properties.tsv" +
🧰 Tools
🪛 yamllint (1.35.1)
[error] 164-164: no new line character at the end of file
(new-line-at-end-of-file)
workflow/schemas/config.schema.yaml (1)
350-359
: Schema structure looks good, consider validation improvementsThe schema correctly defines the custom_alignment_properties structure. Consider adding:
- Pattern validation for the tsv path
- Enum or pattern validation for the column field if there are expected values
Add validation patterns to the schema:
tsv: type: string + pattern: "^[^/].*\\.tsv$" # Must end with .tsv and not start with / column: type: string + pattern: "^[a-zA-Z_][a-zA-Z0-9_]*$" # Valid column name patternconfig/config.yaml (1)
293-293
: Add newline at end of file.Add a newline character at the end of the file to follow YAML best practices.
custom_alignment_properties: activate: false column: "panel" - tsv: "config_alignment_properties.tsv" + tsv: "config_alignment_properties.tsv" +🧰 Tools
🪛 yamllint (1.35.1)
[error] 293-293: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-chm-eval/config.yaml (1)
191-191
: Add newline at end of file.Add a newline character at the end of the file to follow YAML best practices.
custom_alignment_properties: activate: false column: "panel" - tsv: "config_alignment_properties.tsv" + tsv: "config_alignment_properties.tsv" +🧰 Tools
🪛 yamllint (1.35.1)
[error] 191-191: no new line character at the end of file
(new-line-at-end-of-file)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
.test/config-chm-eval/config.yaml
(1 hunks).test/config-giab/config.yaml
(1 hunks).test/config-no-candidate-filtering/config.yaml
(1 hunks).test/config-simple/config.yaml
(1 hunks).test/config-sra/config.yaml
(1 hunks).test/config-target-regions/config.yaml
(1 hunks).test/config-target-regions/config_multiple_beds.yaml
(1 hunks).test/config_primers/config.yaml
(1 hunks)config/config.yaml
(1 hunks)workflow/rules/calling.smk
(1 hunks)workflow/rules/common.smk
(2 hunks)workflow/schemas/config.schema.yaml
(2 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
.test/config-simple/config.yaml
[error] 134-134: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-target-regions/config.yaml
[error] 133-133: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-no-candidate-filtering/config.yaml
[error] 130-130: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-giab/config.yaml
[error] 164-164: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-target-regions/config_multiple_beds.yaml
[error] 131-131: no new line character at the end of file
(new-line-at-end-of-file)
config/config.yaml
[error] 293-293: no new line character at the end of file
(new-line-at-end-of-file)
.test/config_primers/config.yaml
[error] 123-123: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-sra/config.yaml
[error] 116-116: trailing spaces
(trailing-spaces)
[error] 120-120: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-chm-eval/config.yaml
[error] 191-191: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (8)
.test/config-target-regions/config.yaml (1)
130-133
: LGTM! Verify interaction with target regions.
The configuration is consistent with other files. Since this is a target regions configuration, ensure that the custom alignment properties work correctly with the target regions functionality.
Let's verify the interaction between custom alignment properties and target regions:
Add a newline at the end of the file:
activate: false
column: "panel"
tsv: "config_alignment_properties.tsv"
+
✅ Verification successful
Custom alignment properties are properly integrated with target regions
Based on the codebase analysis, the custom alignment properties feature is well-integrated into the variant calling pipeline and works independently from target regions functionality:
- The alignment properties are used by varlociraptor for variant calling through
get_alignment_props()
function - Target regions are handled separately in the
get_target_regions
rule - The two features serve different purposes and don't interfere with each other:
- Target regions define areas of interest for variant calling
- Custom alignment properties provide sample-specific alignment characteristics for variant calling
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential interactions between custom alignment properties and target regions
# Look for references to target regions in alignment-related code
rg -l 'target.*region.*align|align.*target.*region' --type py
# Check if the alignment properties TSV contains any target region related columns
if [ -f "config_alignment_properties.tsv" ]; then
head -n 1 config_alignment_properties.tsv
fi
Length of output: 108
Script:
#!/bin/bash
# Let's try a broader search to understand the relationship between target regions and alignment
# Search for files containing both "target region" and "alignment" in any context
rg -l "target.*region" --type py | xargs rg -l "align"
# Look for alignment-related configuration or utility files
fd "align" -t f
# Check for any target region configuration files
fd "target.*region" -t f
# Look for alignment-related functions or classes
ast-grep --pattern 'class $_ {
$$$
align($$$) {
$$$
}
$$$
}'
ast-grep --pattern 'def align($$$) {
$$$
}'
Length of output: 388
Script:
#!/bin/bash
# Let's examine the content of files that contain both target regions and alignment
# Check workflow/rules/common.smk for alignment and target region interactions
rg "align|target.*region" -C 5 workflow/rules/common.smk
# Check workflow/rules/mapping.smk for alignment and target region interactions
rg "align|target.*region" -C 5 workflow/rules/mapping.smk
# Look for any configuration files that might define how alignment and target regions interact
fd "config" -t f -e yaml -e yml | xargs rg "align|target.*region" -C 2
Length of output: 13107
🧰 Tools
🪛 yamllint (1.35.1)
[error] 133-133: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-target-regions/config_multiple_beds.yaml (1)
128-131
: Add documentation for the new configuration section
Please add comments explaining:
- The purpose of the
custom_alignment_properties
feature - The expected format and content of the TSV file
- The significance of the
panel
column
Verify the TSV file existence and format
The configuration references an external file config_alignment_properties.tsv
. Please ensure:
- The file exists in the expected location
- The file contains the required
panel
column
Add newline at end of file
Add a newline character at the end of the file to comply with YAML best practices.
tsv: "config_alignment_properties.tsv"
+
🧰 Tools
🪛 yamllint (1.35.1)
[error] 131-131: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-no-candidate-filtering/config.yaml (1)
127-130
: Configuration is consistent with other files
The custom_alignment_properties
section matches the structure in other configuration files. Please apply the same documentation and validation improvements mentioned in the previous file's review.
🧰 Tools
🪛 yamllint (1.35.1)
[error] 130-130: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-simple/config.yaml (1)
131-134
: Configuration is consistent with other files
The custom_alignment_properties
section maintains consistency across all configuration files.
Consider documenting feature in workflow documentation
Since this appears to be a global feature being added across multiple configuration files, consider:
- Adding documentation in the workflow's README or documentation
- Including example TSV files in the test configurations
- Documenting any migration steps for existing workflows
🧰 Tools
🪛 yamllint (1.35.1)
[error] 134-134: no new line character at the end of file
(new-line-at-end-of-file)
workflow/rules/calling.smk (1)
50-50
: Verify get_alignment_props() implementation and backward compatibility
The change from static path to dynamic function makes alignment properties more configurable, but we should ensure:
- The get_alignment_props() function exists and returns the correct path format
- Existing workflows aren't broken by this change
✅ Verification successful
Function implementation is correct and maintains backward compatibility
The get_alignment_props()
function is properly implemented with:
- Default behavior returning the same path format as before:
results/alignment-properties/{group}/{sample}.json
- Optional custom alignment properties support through configuration
- Proper error handling for duplicate sample specifications
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify get_alignment_props function implementation and usage
# Check if get_alignment_props is defined in common.smk
echo "Checking for get_alignment_props definition:"
rg -A 5 "def get_alignment_props" workflow/rules/common.smk
# Check return format matches expected path pattern
echo "Checking alignment properties path pattern usage:"
rg -A 2 "results/alignment-properties/\{group\}/\{sample\}\.json"
Length of output: 1046
Script:
#!/bin/bash
# Let's verify the complete implementation and configuration handling
# Check the complete get_alignment_props function implementation
echo "Complete get_alignment_props implementation:"
rg -B 2 -A 15 "def get_alignment_props" workflow/rules/common.smk
# Check for custom_alignment_properties configuration usage
echo -e "\nCustom alignment properties configuration:"
rg "custom_alignment_properties" workflow/rules/common.smk
# Check for extract_unique_sample_column_value function
echo -e "\nRelated extract_unique_sample_column_value function:"
rg -A 5 "def extract_unique_sample_column_value" workflow/rules/common.smk
Length of output: 1631
config/config.yaml (1)
285-293
: LGTM! Well-documented configuration section.
The new custom_alignment_properties
section is clearly documented and follows the existing configuration structure. The comments effectively explain the purpose and usage of each parameter.
🧰 Tools
🪛 yamllint (1.35.1)
[error] 293-293: no new line character at the end of file
(new-line-at-end-of-file)
workflow/rules/common.smk (1)
114-127
: LGTM! Well-structured DataFrame initialization.
The initialization of custom_alignment_props
follows the established pattern in the codebase and properly handles the optional configuration case.
.test/config-chm-eval/config.yaml (1)
188-191
: LGTM! Consistent configuration structure.
The custom_alignment_properties
section maintains consistency with the main configuration file.
🧰 Tools
🪛 yamllint (1.35.1)
[error] 191-191: no new line character at the end of file
(new-line-at-end-of-file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (10)
.test/config-sra/config.yaml (1)
116-116
: Fix YAML formatting issuesThere are two formatting issues:
- Trailing spaces on line 116
- Missing newline at end of file
Apply this diff to fix the formatting:
- generate_excel: true + generate_excel: true custom_alignment_properties: activate: false column: "panel" - tsv: "" \ No newline at end of file + tsv: "" +Also applies to: 120-120
🧰 Tools
🪛 yamllint (1.35.1)
[error] 116-116: trailing spaces
(trailing-spaces)
.test/config_primers/config.yaml (1)
123-123
: Fix missing newline at end of fileAdd a newline character at the end of the file.
Apply this diff:
- tsv: "" \ No newline at end of file + tsv: "" +🧰 Tools
🪛 yamllint (1.35.1)
[error] 123-123: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-target-regions/config.yaml (2)
133-133
: Fix missing newline at end of fileAdd a newline character at the end of the file.
Apply this diff:
- tsv: "" \ No newline at end of file + tsv: "" +🧰 Tools
🪛 yamllint (1.35.1)
[error] 133-133: no new line character at the end of file
(new-line-at-end-of-file)
130-133
: Consider adding test cases for the new featureSince this is a test configuration file, consider adding test cases that:
- Test the feature when activated with valid TSV
- Test error handling with invalid TSV path
- Test error handling with missing column
- Test with empty TSV file
Would you like help creating these test cases?
🧰 Tools
🪛 yamllint (1.35.1)
[error] 133-133: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-target-regions/config_multiple_beds.yaml (2)
128-131
: Add documentation for the new custom_alignment_properties sectionPlease add comments explaining:
- The purpose of this new feature
- The meaning and expected values for each parameter
- Example usage scenarios
Example documentation format:
# Custom alignment properties configuration # Allows specifying per-sample alignment properties via a TSV file custom_alignment_properties: # Whether to use custom alignment properties (default: false) activate: false # Column name in the TSV file that identifies the sample column: "panel" # Path to TSV file containing alignment properties tsv: ""🧰 Tools
🪛 yamllint (1.35.1)
[error] 131-131: no new line character at the end of file
(new-line-at-end-of-file)
131-131
: Add newline at end of fileAdd a newline character at the end of the file to comply with YAML best practices.
🧰 Tools
🪛 yamllint (1.35.1)
[error] 131-131: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-no-candidate-filtering/config.yaml (1)
127-130
: Consider documenting the test configuration templateSince this configuration appears in multiple test files with identical structure and values, consider:
- Adding a comment explaining this is a test configuration
- Creating a base template file that can be extended by test-specific configs
Example approach:
# Base template in .test/templates/base_config.yaml custom_alignment_properties: &default_alignment_props activate: false column: "panel" tsv: "" # In test-specific configs custom_alignment_properties: <<: *default_alignment_props # Inherit from base template # Override specific values if needed🧰 Tools
🪛 yamllint (1.35.1)
[error] 130-130: no new line character at the end of file
(new-line-at-end-of-file)
config/config.yaml (2)
286-289
: Documentation enhancement needed for TSV formatWhile the documentation explains the purpose well, it would be helpful to include:
- The expected column headers in the TSV file
- An example of the TSV content
- The format of the alignment properties JSON files
Would you like me to help draft the enhanced documentation?
290-293
: Add newline at end of fileThe YAML file is missing a newline at the end, which is a POSIX requirement.
custom_alignment_properties: activate: false column: "panel" - tsv: "config/alignment_properties.tsv" + tsv: "config/alignment_properties.tsv" +🧰 Tools
🪛 yamllint (1.35.1)
[error] 293-293: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-chm-eval/config.yaml (1)
187-191
: Add newline at end of fileThe YAML file is missing a newline at the end.
custom_alignment_properties: activate: false column: "panel" - tsv: "" + tsv: "" +🧰 Tools
🪛 yamllint (1.35.1)
[error] 191-191: no new line character at the end of file
(new-line-at-end-of-file)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
config/alignment_properties.tsv
is excluded by!**/*.tsv
📒 Files selected for processing (11)
.test/config-chm-eval/config.yaml
(1 hunks).test/config-giab/config.yaml
(1 hunks).test/config-no-candidate-filtering/config.yaml
(1 hunks).test/config-simple/config.yaml
(1 hunks).test/config-sra/config.yaml
(1 hunks).test/config-target-regions/config.yaml
(1 hunks).test/config-target-regions/config_multiple_beds.yaml
(1 hunks).test/config_primers/config.yaml
(1 hunks)config/config.yaml
(1 hunks)workflow/rules/calling.smk
(1 hunks)workflow/rules/common.smk
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- workflow/rules/calling.smk
🧰 Additional context used
🪛 yamllint (1.35.1)
.test/config-chm-eval/config.yaml
[error] 191-191: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-giab/config.yaml
[error] 164-164: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-no-candidate-filtering/config.yaml
[error] 130-130: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-simple/config.yaml
[error] 134-134: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-sra/config.yaml
[error] 116-116: trailing spaces
(trailing-spaces)
[error] 120-120: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-target-regions/config.yaml
[error] 133-133: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-target-regions/config_multiple_beds.yaml
[error] 131-131: no new line character at the end of file
(new-line-at-end-of-file)
.test/config_primers/config.yaml
[error] 123-123: no new line character at the end of file
(new-line-at-end-of-file)
config/config.yaml
[error] 293-293: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (8)
.test/config-sra/config.yaml (1)
117-120
: Documentation needed for new configuration section
The new custom_alignment_properties
section introduces functionality for custom alignment properties, but lacks documentation explaining:
- Purpose and use cases for this feature
- Required format of the TSV file
- Expected columns in the TSV file
- Impact of the
panel
column setting
Would you like me to help create a documentation template for this feature?
🧰 Tools
🪛 yamllint (1.35.1)
[error] 120-120: no new line character at the end of file
(new-line-at-end-of-file)
.test/config_primers/config.yaml (1)
120-123
: Documentation and validation needed
Same concerns about documentation and validation as in config-sra/config.yaml.
🧰 Tools
🪛 yamllint (1.35.1)
[error] 123-123: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-target-regions/config.yaml (1)
130-133
: Documentation and validation needed
Same concerns about documentation and validation as in config-sra/config.yaml.
🧰 Tools
🪛 yamllint (1.35.1)
[error] 133-133: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-simple/config.yaml (1)
131-134
: Verify implementation details for custom alignment properties
The configuration structure is consistent across files, but we should verify:
- The expected format of the TSV file
- How these properties are validated and used
✅ Verification successful
Let's gather more information about how these properties are used in the workflow file.
Custom alignment properties implementation is properly validated and used
The implementation shows proper handling of the custom alignment properties:
- TSV file is read with proper validation (dtype checking, tab separator)
- Required columns "name" and "path" are enforced
- Fallback path is provided when custom properties are not activated
- Column value extraction is handled safely with null checks
The configuration structure aligns well with its implementation:
activate
: Controls the feature viais_activated()
column
: Used to extract sample column valuetsv
: Path to TSV file with proper format (name, path columns)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for usage of custom alignment properties
echo "=== Searching for TSV parsing and validation ==="
rg -l "custom_alignment_properties"
echo "=== Checking for schema validation ==="
fd "config.schema.yaml" | xargs cat
echo "=== Looking for related test files ==="
fd -e tsv "alignment_properties"
Length of output: 411
Script:
#!/bin/bash
# Check the implementation in common.smk
echo "=== Examining custom alignment properties usage ==="
rg -A 10 "custom_alignment_properties" workflow/rules/common.smk
# Look for any TSV-related code
echo "=== Searching for TSV handling ==="
rg -l "\.tsv"
# Check for any alignment-related functions
echo "=== Looking for alignment-related code ==="
rg -A 5 "alignment" workflow/rules/common.smk
Length of output: 2679
🧰 Tools
🪛 yamllint (1.35.1)
[error] 134-134: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-chm-eval/config.yaml (1)
188-191
: Verify if empty TSV path is intended for tests
The empty TSV path might cause issues if the feature is activated. Please verify if this is intentional for testing purposes.
✅ Verification successful
Empty TSV path in test configuration appears safe
The empty TSV path in the test configuration is safe because:
- The feature is explicitly deactivated (
activate: false
) - Our attempts to find any test cases that activate this feature yielded no results
- This is a test configuration file, where empty values are commonly used as placeholders
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if any test cases activate the feature with empty TSV
rg -l "custom_alignment_properties:(\n|\s)+activate:\s*true" .test/
Length of output: 261
Script:
#!/bin/bash
# Check if any test cases activate the feature with empty TSV using multiline mode
rg -U "custom_alignment_properties:\s*\n\s*activate:\s*true" .test/
# Also check the main config for comparison
rg -U "custom_alignment_properties:\s*\n\s*activate:\s*true" config/
Length of output: 140
Script:
#!/bin/bash
# Check all test configurations for custom_alignment_properties section
rg -A 4 "custom_alignment_properties:" .test/
Length of output: 45
Script:
#!/bin/bash
# List all yaml files in test directory
fd -e yaml -e yml . .test/
# Show content of any yaml files found
fd -e yaml -e yml . .test/ -x cat {}
Length of output: 69
🧰 Tools
🪛 yamllint (1.35.1)
[error] 191-191: no new line character at the end of file
(new-line-at-end-of-file)
.test/config-giab/config.yaml (1)
161-164
: Same issues as in other test config
This file has the same issues as identified in .test/config-chm-eval/config.yaml
:
- Empty TSV path needs verification
- Missing newline at end of file
🧰 Tools
🪛 yamllint (1.35.1)
[error] 164-164: no new line character at the end of file
(new-line-at-end-of-file)
workflow/rules/common.smk (2)
115-120
: LGTM! Well-structured utility function for feature flag checks.
The implementation is clean, safe, and follows best practices:
- Uses dict.get() for safe key access
- Handles nested config traversal correctly
- Clear and descriptive function name
1612-1623
: 🛠️ Refactor suggestion
Remove debug print and add error handling.
The function implementation looks good but has a few areas for improvement:
- Add error handling for missing configuration
- Validate alignment property column exists
def get_alignment_props():
def inner(wildcards):
if is_activated("custom_alignment_properties"):
alignment_prop_column = config["custom_alignment_properties"]["column"]
+ if not alignment_prop_column:
+ raise ValueError("custom_alignment_properties.column must be set")
prop_name = extract_unique_sample_column_value(
wildcards.sample, alignment_prop_column
)
if pd.notna(prop_name):
+ if prop_name not in custom_alignment_props.index:
+ raise ValueError(f"Property '{prop_name}' not found in alignment properties TSV")
return custom_alignment_props.loc[prop_name, "path"]
return "results/alignment-properties/{group}/{sample}.json"
return inner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
workflow/rules/common.smk (1)
122-135
: Consider adding validation for the TSV structure.While the configuration schema ensures file presence, consider validating the structure of the alignment properties TSV file.
Consider adding validation for:
- Required columns
- Data types
- File format
custom_alignment_props = ( ( pd.read_csv( config["custom_alignment_properties"]["tsv"], sep="\t", dtype={"name": str, "path": str}, comment="#", ) + .assign(path=lambda df: df["path"].apply(os.path.expanduser)) .set_index(["name"], drop=False) .sort_index() ) if is_activated("custom_alignment_properties") else None )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
workflow/rules/calling.smk
(1 hunks)workflow/rules/common.smk
(2 hunks)workflow/rules/mapping.smk
(1 hunks)workflow/rules/ref.smk
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- workflow/rules/calling.smk
🧰 Additional context used
📓 Learnings (1)
workflow/rules/common.smk (1)
Learnt from: johanneskoester
PR: snakemake-workflows/dna-seq-varlociraptor#346
File: workflow/rules/common.smk:122-135
Timestamp: 2024-12-13T08:46:33.627Z
Learning: In `workflow/rules/common.smk`, explicit error handling for reading TSV files (e.g., using `pd.read_csv`) is unnecessary when the files are specified in the configuration, as the configuration schema ensures their presence and correctness.
🔇 Additional comments (6)
workflow/rules/ref.smk (3)
167-167
: Consider the implications of making contig files permanent.The removal of
temp()
directive for contig files combined with caching could improve performance for repeated runs but increases storage requirements.Run this script to check if these files are referenced elsewhere:
Also applies to: 172-172
153-153
: Consider the implications of making pangenome files permanent.The removal of
temp()
directive means the VCF file will be kept permanently. While this improves reproducibility and allows for manual inspection, it may increase disk space usage.Run this script to check the typical size of these files:
Also applies to: 158-158
✅ Verification successful
Keeping pangenome VCF files is justified despite storage costs
The pangenome VCF file contains reference haplotype data that may be needed for result interpretation and validation. While these files can be large, re-downloading them would be time-consuming and bandwidth-intensive, especially since they are used in multiple downstream steps (renaming and indexing).
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check size of pangenome VCF files find . -name "*.vcf.gz" -type f -exec du -h {} \;Length of output: 54
Script:
#!/bin/bash # Check for Snakefile and documentation echo "=== Checking for Snakefile and docs ===" fd -e md fd Snakefile echo "=== Checking ref.smk context ===" cat workflow/rules/ref.smkLength of output: 5439
184-184
: Consider the implications of making renamed VCF files permanent.The removal of
temp()
directive for renamed VCF files combined with caching suggests these files are needed for downstream analysis or debugging.Run this script to verify the usage:
Also applies to: 187-187
✅ Verification successful
The permanent storage of renamed VCF files is justified
The renamed VCF files serve as crucial intermediate files in the workflow:
- Used conditionally by the vg autoindex process when chromosome renaming is configured
- Part of the workflow's caching strategy for optimization
- Valuable for debugging chromosome renaming issues
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check references to renamed VCF files rg "renamed.vcf.gz" --type snakemakeLength of output: 70
Script:
#!/bin/bash # Search for renamed VCF files references rg "renamed.vcf.gz" # Search for potential rule definitions in Snakemake files fd -e smk -e snakefile -I Snakefile --exec cat {} # Look for caching configurations rg -i "cache|temp.*directive"Length of output: 1529
Script:
#!/bin/bash # Get context around renamed.vcf.gz references rg -B 5 -A 5 "renamed.vcf.gz" # Look for potential input patterns that might match renamed VCF files rg "input.*vcf" workflow/rules/Length of output: 1167
workflow/rules/mapping.smk (1)
32-32
: Verify system capacity for increased thread count.The thread count increase from 8 to 64 is significant. While this can improve mapping performance, it may:
- Exceed available system resources
- Impact concurrent job execution
- Not scale linearly with thread count
Consider monitoring system performance with this increased thread count. Run this script to check other thread usage in the workflow:
✅ Verification successful
Thread allocation for vg/giraffe is appropriate
The 64-thread allocation for vg/giraffe aligns with the workflow's resource management strategy:
- Other computationally intensive operations also receive high thread counts
- Most standard operations (BWA, samtools) use 8 threads
- Lighter operations use 4 threads or less
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check thread usage across workflow rg "threads:" --type snakemake -A 1Length of output: 69
Script:
#!/bin/bash # Find Snakemake files and search for thread configurations fd -e smk -e snakefile . | xargs rg "threads:" -A 2 -B 2Length of output: 6253
workflow/rules/common.smk (2)
115-120
: LGTM! Well-structured utility function.The
is_activated()
function provides a clean way to check feature activation in nested configurations.
1612-1620
: Consider simplifying the function structure.The function could be simplified by removing the nested structure, as suggested in the past review comments.
Consider this structure:
-def get_alignment_props(wildcards): - if is_activated("custom_alignment_properties"): - alignment_prop_column = config["custom_alignment_properties"]["column"] - prop_name = extract_unique_sample_column_value( - wildcards.sample, alignment_prop_column - ) - if pd.notna(prop_name): - return custom_alignment_props.loc[prop_name, "path"] - return f"results/alignment-properties/{wildcards.group}/{wildcards.sample}.json" +def get_alignment_props(wildcards): + if not is_activated("custom_alignment_properties"): + return f"results/alignment-properties/{wildcards.group}/{wildcards.sample}.json" + + alignment_prop_column = config["custom_alignment_properties"]["column"] + prop_name = extract_unique_sample_column_value( + wildcards.sample, alignment_prop_column + ) + return ( + custom_alignment_props.loc[prop_name, "path"] + if pd.notna(prop_name) + else f"results/alignment-properties/{wildcards.group}/{wildcards.sample}.json" + )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
config/config.yaml (3)
288-291
: Fix typo in documentation comment.The word "estimated" is repeated in the last line.
-# If no property name is set for a sample or custom_alignment_properties is deactivated the alignment properties will be estimated estimated on the sample's read alignments. +# If no property name is set for a sample or custom_alignment_properties is deactivated the alignment properties will be estimated on the sample's read alignments.
292-295
: Document the TSV file format.Please add a comment describing the required columns and format of the alignment properties TSV file (
config/alignment_properties.tsv
). This will help users understand how to structure their data correctly.Example documentation to add:
custom_alignment_properties: activate: false column: "panel" + # TSV file must contain the following columns: + # - property_name: Name of the alignment property (matches values in the sample sheet) + # - path: Path to the JSON file containing the alignment properties tsv: "config/alignment_properties.tsv"🧰 Tools
🪛 yamllint (1.35.1)
[error] 295-295: no new line character at the end of file
(new-line-at-end-of-file)
295-295
: Add newline at end of file.Add a newline character at the end of the file to comply with POSIX standards and prevent potential issues with text processing tools.
🧰 Tools
🪛 yamllint (1.35.1)
[error] 295-295: no new line character at the end of file
(new-line-at-end-of-file)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
config/config.yaml
(1 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
config/config.yaml
[error] 295-295: no new line character at the end of file
(new-line-at-end-of-file)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Formatting
This PR introduces the definition of custom alignment properties estimated by varlociraptor.
In case custom alignment properties are enabled in the config file property names can be assigned to samples in the samplesheet via a user-defined column. The corresponding alignment properties will be looked up from a tsv sheet containing property names and paths to json-files. In case some sample does not contain a property name (cell entry is NaN) the alignment properties will be estimate by varlociraptor for that sample.
Summary by CodeRabbit
Configuration Updates
custom_alignment_properties
section to multiple configuration filesSchema Updates
custom_alignment_properties
objectWorkflow Enhancements