-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: replaced capturing regex terms with non capturing in datagraph t… #162
Conversation
WalkthroughThe pull request involves significant modifications to the Changes
Poem
Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
src/imgtools/modules/datagraph.py (1)
Add unit tests for graph query regex patterns
The regex patterns in
datagraph.py
are critical for determining valid edge relationships between modalities, but currently lack test coverage. These patterns need dedicated unit tests to verify:
- Edge type combinations (0, 1, 2, etc.)
- Modality relationship validation (CT/MR, RTSTRUCT, RTDOSE, etc.)
- Error cases and edge conditions
Example test cases should cover:
- Basic edge type matching
- Complex combinations with multiple look-ahead assertions
- CT/MR agnostic functionality
- Invalid pattern handling
🔗 Analysis chain
Line range hint
343-447
: Consider adding unit tests for regex patterns.The regex patterns are critical for the query functionality. Consider adding unit tests to verify their behavior.
Would you like me to generate comprehensive unit tests for these regex patterns?
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for existing test files that might contain regex pattern tests # Look for test files echo "Searching for test files..." fd -e py "test.*datagraph" # Look for regex pattern tests in test files echo "Searching for regex pattern tests..." rg -A 5 "(?:(?:(?=.*0)|(?=.*5)(?=.*6))|(?:(?=.*1)(?=.*2)))"Length of output: 497
Script:
#!/bin/bash # Search for test files and their contents echo "=== Finding test files ===" fd -e py "test.*" -x echo "=== File: {}" \; -x cat "{}" echo -e "\n=== Searching for pattern/regex related tests ===" rg -i "test.*pattern|test.*regex|test.*graph_query" -A 5 echo -e "\n=== Searching for graph_query method tests ===" rg "def test.*graph" -A 5Length of output: 51194
Script:
#!/bin/bash # Search for datagraph test files and graph_query tests echo "=== Searching for datagraph test files ===" fd -t f -e py "datagraph" echo -e "\n=== Searching for graph_query tests ===" rg -l "def test.*graph_query|test.*datagraph" echo -e "\n=== Looking for graph_query method usage ===" rg "graph_query" -A 5 -B 5Length of output: 3715
🧹 Nitpick comments (2)
src/imgtools/modules/datagraph.py (2)
Line range hint
343-447
: Consider adding regex pattern documentation.While the regex patterns are working correctly, they are complex and could benefit from detailed documentation explaining each pattern's purpose and components.
Add comments above each regex pattern explaining its components. For example:
+ # Pattern explanation: + # (?:(?:(?=.*0)|(?=.*5)(?=.*6))|(?:(?=.*1)(?=.*2))) + # - (?:...) - non-capturing group + # - (?=.*0) - lookahead for edge type 0 + # - | - OR operator + # - (?=.*5)(?=.*6) - lookahead for both edge types 5 and 6 + # - | - OR operator + # - (?=.*1)(?=.*2) - lookahead for both edge types 1 and 2
Line range hint
343-447
: Consider extracting regex patterns to constants.The regex patterns are complex and used in multiple places. Consider extracting them to named constants at the class level for better maintainability.
class DataGraph: + # Regex patterns for different query combinations + REGEX_CT_RTSTRUCT_RTDOSE = '(?:(?:(?=.*0)|(?=.*5)(?=.*6))|(?:(?=.*1)(?=.*2)))' + REGEX_RTDOSE_CT = '(?:(?=.*1)|(?:(?:(?=.*0)|(?=.*5)(?=.*6))(?=.*2)))' + REGEX_RTSTRUCT_CT = '(?:(?=.*2)|(?:(?:(?=.*0)|(?=.*5)(?=.*6))(?=.*1)))' + REGEX_CT_MR_RTSTRUCT_RTDOSE = '(?:(?=.*1)|(?=.*0)|(?=.*5)(?=.*6))(?=.*2)' + REGEX_CT_MR_RTSTRUCT_RTDOSE_PT = '(?:(?=.*1)|(?=.*0)|(?=.*5)(?=.*6))(?=.*2)(?=.*3)(?=.*4)' + REGEX_CT_MR_RTDOSE_PT = '(?=.*4)(?:(?=.*1)|(?:(?=.*2)(?:(?=.*0)|(?=.*5)(?=.*6))))' def parser(self, query_string: str) -> pd.DataFrame: # ... - regex_term = '(?:(?:(?=.*0)|(?=.*5)(?=.*6))|(?:(?=.*1)(?=.*2)))' + regex_term = self.REGEX_CT_RTSTRUCT_RTDOSE # ...
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/imgtools/modules/datagraph.py
(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Unit-Tests (macos-14, py312)
- GitHub Check: Unit-Tests (macos-14, py311)
- GitHub Check: Unit-Tests (macos-14, py310)
- GitHub Check: Unit-Tests (macos-latest, py311)
- GitHub Check: Unit-Tests (macos-latest, py310)
- GitHub Check: Unit-Tests (ubuntu-latest, py312)
- GitHub Check: Unit-Tests (ubuntu-latest, py311)
- GitHub Check: Unit-Tests (ubuntu-latest, py310)
🔇 Additional comments (6)
src/imgtools/modules/datagraph.py (6)
343-343
: LGTM! Non-capturing groups improve performance.The regex pattern has been correctly updated to use non-capturing groups (
?:
) for the CT-RTSTRUCT and CT-RTDOSE case.
350-350
: LGTM! Non-capturing groups improve performance.The regex pattern has been correctly updated to use non-capturing groups (
?:
) for the RTDOSE->CT case.
354-354
: LGTM! Non-capturing groups improve performance.The regex pattern has been correctly updated to use non-capturing groups (
?:
) for the RTSTRUCT->CT case.
417-417
: LGTM! Non-capturing groups improve performance.The regex pattern has been correctly updated to use non-capturing groups (
?:
) for the CT/MR,RTSTRUCT,RTDOSE case.
427-427
: LGTM! Non-capturing groups improve performance.The regex pattern has been correctly updated to use non-capturing groups (
?:
) for the CT/MR,RTSTRUCT,RTDOSE,PT case.
447-447
: LGTM! Non-capturing groups improve performance.The regex pattern has been correctly updated to use non-capturing groups (
?:
) for the CT/MR,RTDOSE,PT case.
Might be better to replace all regex with conditional statements. So instead of: regex_term = '(?:(?=.*1)|(?=.*0)|(?=.*5)(?=.*6))(?=.*2)'
relevant_study_id = self.df_new.loc[
self.df_new.edge_type.str.contains(f'(?:{regex_term})', regex=True),
'study_x',
].unique() Use conditional statements: valid_condition = (
(self.df_new['edge_type'].str.contains('1') |
self.df_new['edge_type'].str.contains('0') |
(self.df_new['edge_type'].str.contains('5') & self.df_new['edge_type'].str.contains('6'))) &
self.df_new['edge_type'].str.contains('2')
)
relevant_study_id = self.df_new.loc[valid_condition, 'study_x'].unique() |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #162 +/- ##
=======================================
Coverage 44.25% 44.25%
=======================================
Files 47 47
Lines 3606 3606
=======================================
Hits 1596 1596
Misses 2010 2010 ☔ View full report in Codecov by Sentry. |
Id be okay with the conditional statements if you can implement it |
Did a rough implementation that passes tests. Will clean up and push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
src/imgtools/modules/datagraph.py (1)
Line range hint
342-472
: Convert lambda functions to regular functions for better maintainability.The current implementation uses multiple lambda functions for edge conditions, which makes the code harder to read and maintain. Converting these to regular functions would improve readability, allow for documentation, and make debugging easier.
Here's a suggested refactoring approach:
+def _check_rtstruct_rtdose_connection(edge_type: str) -> bool: + """Check if there's a connection between RTSTRUCT and RTDOSE.""" + return '0' in edge_type or ('5' in edge_type and '6' in edge_type) + +def _check_edge_condition_rtstruct_rtdose(row: pd.Series) -> bool: + """Check edge condition for RTSTRUCT-RTDOSE connection.""" + return ( + _check_rtstruct_rtdose_connection(row['edge_type']) or + ('1' in row['edge_type'] and '2' in row['edge_type']) + ) + +def _check_edge_condition_ct_rtdose(row: pd.Series) -> bool: + """Check edge condition for CT-RTDOSE connection.""" + return ( + '1' in row['edge_type'] or ( + _check_rtstruct_rtdose_connection(row['edge_type']) and + '2' in row['edge_type'] + ) + ) + +def _check_edge_condition_ct_rtstruct(row: pd.Series) -> bool: + """Check edge condition for CT-RTSTRUCT connection.""" + return ( + '2' in row['edge_type'] or ( + _check_rtstruct_rtdose_connection(row['edge_type']) and + '1' in row['edge_type'] + ) + ) + +def _check_edge_condition_ct_rtstruct_rtdose(row: pd.Series) -> bool: + """Check edge condition for CT-RTSTRUCT-RTDOSE connection.""" + return ( + '2' in row['edge_type'] and + ('1' in row['edge_type'] or _check_rtstruct_rtdose_connection(row['edge_type'])) + ) + +def _check_edge_condition_ct_rtstruct_rtdose_pt(row: pd.Series) -> bool: + """Check edge condition for CT-RTSTRUCT-RTDOSE-PT connection.""" + return ( + ('1' in row['edge_type'] or _check_rtstruct_rtdose_connection(row['edge_type'])) and + '2' in row['edge_type'] and + '3' in row['edge_type'] and + '4' in row['edge_type'] + ) - edge_condition = lambda row: ( - ('0' in row['edge_type'] or ('5' in row['edge_type'] and '6' in row['edge_type'])) or - ('1' in row['edge_type'] and '2' in row['edge_type']) - ) + edge_condition = _check_edge_condition_rtstruct_rtdose # Apply similar changes for other edge conditionsThis refactoring:
- Improves code readability with descriptive function names
- Allows for documentation through docstrings
- Makes the code easier to test and debug
- Follows Python style guidelines
- Reduces code duplication through helper functions
🧰 Tools
🪛 Ruff (0.8.2)
343-346: Do not assign a
lambda
expression, use adef
Rewrite
edge_condition
as adef
(E731)
353-358: Do not assign a
lambda
expression, use adef
Rewrite
edge_condition
as adef
(E731)
362-367: Do not assign a
lambda
expression, use adef
Rewrite
edge_condition
as adef
(E731)
🧹 Nitpick comments (1)
src/imgtools/modules/datagraph.py (1)
Line range hint
508-512
: Update method documentation to reflect the new parameter type.The method signature has been updated to use
edge_condition: Callable
instead ofregex_term: str
, but the docstring still references the old parameter. Please update the documentation to reflect this change.""" Based on the regex forms the final dataframe. You can query the edge table based on the regex to get the subgraph in which the queried edges will be present. Parameters ---------- - regex_term: str - To search the string in edge_type column of self.df_new which is aggregate of all the edges in a single study + edge_condition: Callable + A function that takes a row from self.df_new and returns a boolean indicating whether the edge conditions are met
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pixi.lock
is excluded by!**/*.lock
📒 Files selected for processing (1)
src/imgtools/modules/datagraph.py
(8 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
src/imgtools/modules/datagraph.py
343-346: Do not assign a lambda
expression, use a def
Rewrite edge_condition
as a def
(E731)
353-358: Do not assign a lambda
expression, use a def
Rewrite edge_condition
as a def
(E731)
362-367: Do not assign a lambda
expression, use a def
Rewrite edge_condition
as a def
(E731)
430-433: Do not assign a lambda
expression, use a def
Rewrite edge_condition
as a def
(E731)
443-448: Do not assign a lambda
expression, use a def
Rewrite edge_condition
as a def
(E731)
458-458: Do not assign a lambda
expression, use a def
Rewrite edge_condition
as a def
(E731)
468-472: Do not assign a lambda
expression, use a def
Rewrite edge_condition
as a def
(E731)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Unit-Tests (macos-14, py312)
- GitHub Check: Unit-Tests (macos-14, py311)
- GitHub Check: Unit-Tests (macos-14, py310)
- GitHub Check: Unit-Tests (macos-latest, py311)
- GitHub Check: Unit-Tests (macos-latest, py310)
- GitHub Check: Unit-Tests (ubuntu-latest, py312)
- GitHub Check: Unit-Tests (ubuntu-latest, py311)
- GitHub Check: Unit-Tests (ubuntu-latest, py310)
🔇 Additional comments (1)
src/imgtools/modules/datagraph.py (1)
3-3
: LGTM! Type hint addition is appropriate.The addition of
Callable
to the imports aligns well with the refactoring from regex patterns to callable edge conditions.
@JoshuaSiraj is this ready? |
Yeah, have a ook through please |
Actually I forgot to fix string doc. One sec |
@jjjermiah Good now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
src/imgtools/modules/datagraph.py (1)
Line range hint
342-472
: Replace lambda expressions with proper function definitions.The current implementation uses multiple lambda expressions for edge conditions, which violates Python style guidelines and reduces code readability. Consider refactoring these into proper function definitions.
Here's a suggested refactoring pattern:
-edge_condition = lambda row: ( - ('0' in row['edge_type'] or ('5' in row['edge_type'] and '6' in row['edge_type'])) or - ('1' in row['edge_type'] and '2' in row['edge_type']) -) +def check_rtstruct_rtdose_condition(row: pd.Series) -> bool: + """Check if the row contains valid RTSTRUCT-RTDOSE edge combinations. + + Valid combinations are: + 1. Direct connection (edge type 0) + 2. Connection via RTPLAN (edge types 5 and 6) + 3. Connection via CT (edge types 1 and 2) + """ + has_direct = '0' in row['edge_type'] + has_plan_path = '5' in row['edge_type'] and '6' in row['edge_type'] + has_ct_path = '1' in row['edge_type'] and '2' in row['edge_type'] + return has_direct or has_plan_path or has_ct_pathBenefits:
- Improved readability with descriptive function names
- Documentation through docstrings
- Type hints for better IDE support
- Easier to test individual conditions
- Follows Python style guidelines
🧰 Tools
🪛 Ruff (0.8.2)
343-346: Do not assign a
lambda
expression, use adef
Rewrite
edge_condition
as adef
(E731)
353-358: Do not assign a
lambda
expression, use adef
Rewrite
edge_condition
as adef
(E731)
362-367: Do not assign a
lambda
expression, use adef
Rewrite
edge_condition
as adef
(E731)
🧹 Nitpick comments (1)
src/imgtools/modules/datagraph.py (1)
Line range hint
342-472
: Consider introducing an EdgeConditionFactory for better organization.The edge conditions could be better organized using a factory pattern or similar structural pattern to:
- Centralize all edge condition definitions
- Make conditions more reusable
- Simplify testing
- Improve maintainability
Example approach:
from dataclasses import dataclass from typing import Callable, Dict @dataclass class EdgeConditionFactory: """Factory for creating edge condition checkers.""" @staticmethod def create_condition(condition_type: str) -> Callable[[pd.Series], bool]: conditions: Dict[str, Callable] = { 'RTSTRUCT_RTDOSE': check_rtstruct_rtdose_condition, 'CT_RTDOSE': check_ct_rtdose_condition, # ... other conditions } return conditions[condition_type]This would allow you to:
- Centralize all edge conditions in one place
- Easily add new conditions
- Reuse conditions across different queries
- Simplify testing by focusing on individual conditions
🧰 Tools
🪛 Ruff (0.8.2)
343-346: Do not assign a
lambda
expression, use adef
Rewrite
edge_condition
as adef
(E731)
353-358: Do not assign a
lambda
expression, use adef
Rewrite
edge_condition
as adef
(E731)
362-367: Do not assign a
lambda
expression, use adef
Rewrite
edge_condition
as adef
(E731)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/imgtools/modules/datagraph.py
(9 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
src/imgtools/modules/datagraph.py
343-346: Do not assign a lambda
expression, use a def
Rewrite edge_condition
as a def
(E731)
353-358: Do not assign a lambda
expression, use a def
Rewrite edge_condition
as a def
(E731)
362-367: Do not assign a lambda
expression, use a def
Rewrite edge_condition
as a def
(E731)
430-433: Do not assign a lambda
expression, use a def
Rewrite edge_condition
as a def
(E731)
443-448: Do not assign a lambda
expression, use a def
Rewrite edge_condition
as a def
(E731)
458-458: Do not assign a lambda
expression, use a def
Rewrite edge_condition
as a def
(E731)
468-472: Do not assign a lambda
expression, use a def
Rewrite edge_condition
as a def
(E731)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Unit-Tests (macos-14, py312)
- GitHub Check: Unit-Tests (macos-14, py311)
- GitHub Check: Unit-Tests (macos-14, py310)
- GitHub Check: Unit-Tests (macos-latest, py311)
- GitHub Check: Unit-Tests (macos-latest, py310)
- GitHub Check: Unit-Tests (ubuntu-latest, py312)
- GitHub Check: Unit-Tests (ubuntu-latest, py311)
- GitHub Check: Unit-Tests (ubuntu-latest, py310)
🔇 Additional comments (2)
src/imgtools/modules/datagraph.py (2)
3-3
: LGTM! Type annotation addition is appropriate.The addition of
Callable
to the imports supports the refactoring from regex patterns to callable functions.
Line range hint
508-524
: LGTM! Method signature change is well-documented.The change from
regex_term
toedge_condition
is appropriate and maintains good documentation:
- Type hint correctly specifies
Callable
- Docstring updated to reflect the parameter change
- Change aligns with the overall refactoring strategy
…o deter warning
Summary by CodeRabbit