-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: begin indexing database, update find-dicoms cli, add cli documentation #148
Conversation
chore: update .gitignore to exclude dicom_index.sqlite chore(deps): add sqlitedict and sqlalchemy dependencies refactor(dicom): improve read_tags function and simplify logging
…pdate session method signature
…operations and indexing
…ing configuration (#146) Refactor logging code for improved maintainability by temporarily removing JSON logging functionality. Update the .gitignore to exclude .dcm and .sqlite files. Introduce an environment variable for debug logging and set the default log level based on this variable. - **New Features** - Introduced a new `LoggingManager` class for structured logging management. - Added support for JSON logging configuration (currently commented out). - Added a new environment variable for logging level configuration in development. - New entries in `.gitignore` to ignore `.dcm` and `.sqlite` files. - Enhanced CI/CD pipeline with new jobs for building and publishing documentation. - **Improvements** - Enhanced logging control by utilizing environment variables for log levels. - Updated logging directory structure and default log filename for better organization. - Improved management of logging levels based on verbosity settings. - **Bug Fixes** - Improved handling of logging level discrepancies with user-specified settings. - **Documentation** - Updated docstrings and comments for clarity on new logging setup and features.
…ation deployment step
chore: update readme chore: format and clean
…tions feat: enhance DICOM indexing with improved series handling and logging save
Warning Rate limit exceeded@jjjermiah has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 23 minutes and 3 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThis pull request introduces several changes across multiple files, primarily focusing on updates to the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (25)
docs/.pages (1)
3-3
: Consider replacing ellipsis with explicit entriesThe
...
might be unclear to other contributors. Consider listing all navigation entries explicitly for better maintainability.docs/database_report.md (2)
8-24
: Add blank lines around the Modality Summary tableTo improve markdown formatting and comply with markdown standards, add blank lines before and after the table.
Apply this diff:
## Modality Summary + | Modality | Count | |:-----------|--------:| | MR | 152 | ... | NM | 1 | | CR | 11 | +🧰 Tools
🪛 Markdownlint (0.35.0)
9-9: null
Tables should be surrounded by blank lines(MD058, blanks-around-tables)
25-127
: Add blank lines around the Data Summary tableTo improve markdown formatting and comply with markdown standards, add blank lines before and after the table.
The Data Summary table provides comprehensive information about patient records and their associated studies. The structure and content are well-organized and informative.
Apply this diff:
## Data Summary + | Patient ID | Number of Studies | Number of Series | Unique Modalities | |:-------------|--------------------:|-------------------:|:---------------------------------------------| | TCGA-06-0184 | 12 | 183 | {'SEG', 'MR'} | ... | TCGA-CV-7245 | 2 | 7 | {'RTPLAN', 'CT', 'RTSTRUCT', 'RTDOSE'} | +🧰 Tools
🪛 Markdownlint (0.35.0)
26-26: null
Tables should be surrounded by blank lines(MD058, blanks-around-tables)
src/imgtools/dicom/index/database/database.py (2)
41-72
: Refactor repetitive insert methods to reduce code duplicationThe methods
_insert_patient
,_insert_study
,_insert_series
, and_insert_image
share a similar pattern of checking for an existing record and creating a new one if not found. To enhance maintainability and adhere to the DRY (Don't Repeat Yourself) principle, consider creating a generic method that handles this logic.Apply this refactor:
def _get_or_create(self, session: Session, model: Type[T], defaults: Dict[str, Any], **kwargs) -> T: instance = session.query(model).filter_by(**kwargs).first() if instance: return instance else: instance = model(**defaults) session.add(instance) return instanceThen, update the insert methods as follows:
-def _insert_patient(self, session: Session, metadata: Dict[str, str]) -> Patient: - patient = session.query(Patient).filter_by(PatientID=metadata['PatientID']).first() - if not patient: - patient = Patient.from_metadata(metadata) - session.add(patient) - return patient +def _insert_patient(self, session: Session, metadata: Dict[str, str]) -> Patient: + return self._get_or_create( + session, + Patient, + defaults=Patient.from_metadata(metadata).__dict__, + PatientID=metadata['PatientID'] + )Repeat similar changes for the
_insert_study
,_insert_series
, and_insert_image
methods.
Line range hint
288-320
: Handle exceptions when reading DICOM files gracefullyIn the
Series.from_metadata
method, exceptions raised during DICOM file reading will halt the indexing process. To improve robustness, consider logging the error and skipping the problematic file instead of re-raising the exception.Apply this change:
case 'RTSTRUCT': try: ds = dcmread(file_path, stop_before_pixels=True) logger.info(f'Reading RTSTRUCT file: {file_path}') roi_names = [roi.ROIName for roi in ds.StructureSetROISequence] rfrs = ds.ReferencedFrameOfReferenceSequence[0] ref_series_seq = ( rfrs.RTReferencedStudySequence[0] .RTReferencedSeriesSequence[0] .SeriesInstanceUID ) frame_of_reference = rfrs.FrameOfReferenceUID base_cls._RTSTRUCT = { 'ROINames': ",".join(roi_names), 'RTReferencedSeriesUID': ref_series_seq, 'FrameOfReferenceUID': frame_of_reference, } - except Exception as e: + except AttributeError as e: logger.exception(f'Error reading RTSTRUCT file: {file_path}') - raise e + # Skip adding RTSTRUCT data but proceed with indexing case 'MR': try: ds = dcmread(file_path, stop_before_pixels=True) logger.info(f'Reading MR file: {file_path}') _mr = { 'RepetitionTime': ds.RepetitionTime, 'EchoTime': ds.EchoTime, 'SliceThickness': ds.SliceThickness, 'ScanningSequence': ds.ScanningSequence, 'MagneticFieldStrength': ds.MagneticFieldStrength, 'ImagedNucleus': ds.ImagedNucleus, } base_cls._MR = {k: str(v) for k, v in _mr.items()} - except Exception as e: + except AttributeError as e: logger.exception(f'Error reading MR file: {file_path}') - raise e + # Skip adding MR data but proceed with indexingBy catching specific exceptions like
AttributeError
, you prevent the entire indexing process from failing due to a single corrupt or malformed file.src/imgtools/dicom/index/models/models.py (3)
14-15
: Optimize imports by checking for typingThe
TYPE_CHECKING
constant is used to prevent certain imports at runtime. Ensure that the import frompathlib
is correctly placed inside theif TYPE_CHECKING
block.if TYPE_CHECKING: - from pathlib import Path + from pathlib import PathSince
Path
is used in type annotations at runtime (e.g., infrom_metadata
methods), the import should remain outside theif TYPE_CHECKING
block to preventNameError
.
27-39
: Simplify__repr__
method implementationThe custom
__repr__
method can be simplified for clarity and to avoid potential issues with handling of different data types.Apply this change:
def __repr_method__(self) -> str: attrs = { - key: (len(value) if isinstance(value, list) else value) - for key, value in vars(self).items() - if not key.startswith('_') and isinstance(value, (str, int, float, list)) + key: value + for key, value in vars(self).items() + if not key.startswith('_') } class_name = self.__class__.__name__ fields = ', '.join(f'{key}={value}' for key, value in attrs.items()) - return f'<{class_name}(\n {fields}\n)>' + return f'<{class_name}({fields})>'This change provides a cleaner representation and avoids unexpected behavior with data types.
331-337
: Add type hint forROINames
property and fix formattingThe
ROINames
property lacks a return type hint and has a minor formatting issue with the comment.Apply this change:
@property - def ROINames(self) -> Optional[str]: # noqa: N802 + def ROINames(self) -> Optional[str]: # noqa: N802 return self._RTSTRUCT.get('ROINames', None) if self._RTSTRUCT else Nonesrc/imgtools/dicom/index/__init__.py (1)
1-3
: Ensure consistent import order and formattingFor better readability and consistency, consider grouping imports and adding blank lines where appropriate.
Apply this change:
from .database import DatabaseHandler, DICOMDatabaseInterface, DICOMIndexer, DICOMInsertMixin +__all__ = ['DatabaseHandler', 'DICOMDatabaseInterface', 'DICOMIndexer', 'DICOMInsertMixin']
This places the
__all__
declaration right after the imports, without unnecessary blank lines.src/imgtools/dicom/index/database/__init__.py (1)
1-4
: Remove unnecessary# noqa
comment and ensure PEP 8 complianceThe
# noqa
comment may be unnecessary if there are no linting issues. Also, ensure that imports are correctly formatted according to PEP 8.Apply this change:
-from .database_handler import DatabaseHandler # noqa +from .database_handler import DatabaseHandler from .database import DICOMDatabaseInterface, DICOMIndexer, DICOMInsertMixin __all__ = ['DatabaseHandler', 'DICOMDatabaseInterface', 'DICOMIndexer', 'DICOMInsertMixin']src/imgtools/dicom/index/database/database_handler.py (2)
32-34
: Consider adding database versioning and migration support.The current implementation lacks a database versioning and migration strategy, which could make future schema updates challenging.
Consider integrating Alembic for database migrations:
- Initialize Alembic in your project
- Track schema versions
- Create migration scripts for schema changes
Example integration:
from alembic import command from alembic.config import Config def setup_database(self): alembic_cfg = Config("alembic.ini") command.upgrade(alembic_cfg, "head")
37-55
: Enhance session management with connection pooling and timeouts.The current session management could be improved with connection pooling and timeout configurations for better resource management.
@contextmanager def session(self) -> Generator[Session, None, None]: session = self.Session() try: + # Set session timeout + session.execute("SET statement_timeout = '30s'") yield session session.commit() # Commit the transaction except Exception: session.rollback() # Rollback on exception raise finally: session.close()Also consider adding connection pooling configuration:
def __init__(self, db_path: Path, force_delete: bool = False) -> None: self.engine = create_engine( f'sqlite:///{db_path}', future=True, pool_size=5, max_overflow=10, pool_timeout=30 )src/imgtools/cli/__init__.py (1)
70-73
: Consider adding validation for quiet and verbose flags combination.The decorator should validate that quiet and verbose flags aren't used together.
def decorator(func: FC) -> FC: + def validate_options(ctx: click.Context) -> None: + if ctx.params.get('quiet') and ctx.params.get('verbose', 0) > 0: + raise click.UsageError("Cannot use both --quiet and --verbose options together") + func = click.option(*param_decls, **kwargs)(func) func = click.option(*quiet_decl, is_flag=True, help='Suppress all logging except errors, overrides verbosity options.')(func) + # Add validation before command execution + original_callback = func.callback + def wrapped_callback(*args, **kwargs): + ctx = click.get_current_context() + validate_options(ctx) + return original_callback(*args, **kwargs) + func.callback = wrapped_callback return funcsrc/imgtools/dicom/index/__main__.py (3)
1-12
: Consider using type hints for DEFAULT_DB_DIR and DEFAULT_DB_NAMEThe constants should have type hints for better code maintainability and IDE support.
-DEFAULT_DB_DIR = Path('.imgtools') -DEFAULT_DB_NAME = 'imgtools.db' +DEFAULT_DB_DIR: Path = Path('.imgtools') +DEFAULT_DB_NAME: str = 'imgtools.db'
72-77
: Consider making database path configurableThe database path is currently hardcoded. Consider making it configurable through CLI options or environment variables.
+@click.option( + '--db-path', + type=click.Path( + file_okay=True, + dir_okay=False, + writable=True, + path_type=Path, + ), + help='Custom database path', + default=None, +) def index(...): - db_path = DEFAULT_DB_DIR / DEFAULT_DB_NAME + db_path = Path(db_path) if db_path else DEFAULT_DB_DIR / DEFAULT_DB_NAME
85-87
: Add progress feedback for large indexing operationsFor better user experience, consider adding a progress bar when indexing large numbers of files.
+ with click.progressbar( + dicom_files, + label='Indexing DICOM files', + item_show_func=lambda p: str(p) if p else '' + ) as files: - indexer.build_index_from_files(dicom_files) + indexer.build_index_from_files(files)src/imgtools/logging/json_logging.py (1)
24-33
: Add file size limit validationConsider adding validation for available disk space before creating log files to prevent disk space issues.
try: log_dir = self.base_dir / LOG_DIR_NAME log_dir.mkdir(parents=True, exist_ok=True) json_log_file = log_dir / DEFAULT_LOG_FILENAME + + # Check available disk space + if json_log_file.exists(): + stat = os.statvfs(log_dir) + available_space = stat.f_frsize * stat.f_bavail + if available_space < 1024 * 1024 * 100: # 100MB minimum + raise RuntimeError(f"Insufficient disk space in {log_dir}") + if json_log_file.exists() and not os.access(json_log_file, os.W_OK):src/imgtools/cli/dicomfind.py (1)
107-109
: Improve clarity of warning messageThe warning message about search input matching could be clearer and provide examples.
- warningmsg += f' Search input "{search_input}" did not match any DICOM files.' - warningmsg += ' Note: ALL search inputs must match to return a result.' + warningmsg += ( + f'\nSearch patterns {search_input} did not match any DICOM files.\n' + 'Note: ALL patterns must match for a file to be included.\n' + 'Example: "patient1" "series2" will only match files containing both terms.' + )pixi.toml (2)
57-57
: Consider less restrictive version constraints for sqlalchemy-stubsThe current version constraint
>=0.4,<0.5
is quite restrictive. Consider using a more flexible range to avoid potential dependency conflicts and make future updates easier.-sqlalchemy-stubs = ">=0.4,<0.5" +sqlalchemy-stubs = ">=0.4,<1.0"
118-132
: Review task organization and naming conventionThe underscore prefix for tasks (
_ruff-check
,_ruff-format
,_type-check
) suggests these are internal/private tasks. However:
- The naming is inconsistent (using both hyphen and underscore)
- The relationship between these tasks and the main
qc
task could be clearer in the documentationConsider standardizing the task naming:
-_ruff-check -_ruff-format -_type-check +_ruff_check +_ruff_format +_type_checktests/dicom/test_dicom_utils.py (3)
65-68
: Enhance limit testing with edge casesConsider adding test cases for:
- Zero limit
- Negative limit
- Limit larger than available files
def test_limit_edge_cases(self, temp_dir_with_files): # Zero limit result = find_dicoms(temp_dir_with_files, recursive=True, check_header=False, limit=0) assert len(result) == 0 # Negative limit result = find_dicoms(temp_dir_with_files, recursive=True, check_header=False, limit=-1) assert len(result) == 0 # or raise ValueError, depending on implementation # Limit larger than available files result = find_dicoms(temp_dir_with_files, recursive=True, check_header=False, limit=100) assert len(result) == 4 # total number of test files
70-75
: Expand search input testingConsider adding test cases for:
- Multiple search patterns
- Regex patterns
- Case sensitivity
def test_search_with_complex_patterns(self, temp_dir_with_files): # Create test files (temp_dir_with_files / "SCAN_001.dcm").touch() (temp_dir_with_files / "scan_002.dcm").touch() (temp_dir_with_files / "CT_scan_003.dcm").touch() # Multiple patterns result = find_dicoms(temp_dir_with_files, search_input=["SCAN", "CT"]) assert len(result) == 3 # Regex pattern result = find_dicoms(temp_dir_with_files, search_input=[r"scan_\d{3}"]) assert len(result) == 1
77-83
: Strengthen combined options testThe current test could be enhanced by:
- Verifying the order of results (if limit is applied)
- Testing with multiple search patterns
- Adding assertions for excluded files
def test_combined_options_enhanced(self, temp_dir_with_files, mocker): mocker.patch("imgtools.dicom.utils.is_dicom", return_value=True) # Create various test files (temp_dir_with_files / "scan1.dcm").touch() (temp_dir_with_files / "scan2.dcm").touch() (temp_dir_with_files / "other.dcm").touch() result = find_dicoms( temp_dir_with_files, recursive=True, check_header=True, extension="dcm", limit=1, search_input=["scan"] ) # Verify limit and search assert len(result) == 1 assert result[0].name == "scan1.dcm" # First match # Verify excluded files all_files = set(find_dicoms(temp_dir_with_files, recursive=True)) excluded = all_files - set(result) assert any("other.dcm" in str(f) for f in excluded)src/imgtools/logging/logging_config.py (1)
146-146
: Remove unnecessary blank lineThis blank line doesn't serve any purpose and can be removed for better code consistency.
src/imgtools/dicom/utils.py (1)
42-43
: Consider optimizing search input filteringThe implementation is correct, but for large directories, the current approach of checking all search terms against the full path string could be optimized.
Consider using a more efficient approach:
-and (not search_input or all(term in str(file.as_posix()) for term in search_input)) +and (not search_input or all(term.lower() in str(file.as_posix()).lower() for term in search_input))This change would:
- Make the search case-insensitive
- Convert the path to string only once
Also applies to: 143-147
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
pixi.lock
is excluded by!**/*.lock
📒 Files selected for processing (26)
.gitignore
(1 hunks)README.md
(0 hunks)config/ruff.toml
(1 hunks)docs/.pages
(1 hunks)docs/cli/.pages
(1 hunks)docs/cli/imgtools.md
(1 hunks)docs/database_report.md
(1 hunks)mkdocs.yml
(1 hunks)pixi.toml
(5 hunks)pyproject.toml
(2 hunks)src/imgtools/cli/__init__.py
(1 hunks)src/imgtools/cli/__main__.py
(2 hunks)src/imgtools/cli/dicomfind.py
(3 hunks)src/imgtools/dicom/index/__init__.py
(1 hunks)src/imgtools/dicom/index/__main__.py
(1 hunks)src/imgtools/dicom/index/database/__init__.py
(1 hunks)src/imgtools/dicom/index/database/database.py
(1 hunks)src/imgtools/dicom/index/database/database_handler.py
(1 hunks)src/imgtools/dicom/index/models/__init__.py
(1 hunks)src/imgtools/dicom/index/models/models.py
(1 hunks)src/imgtools/dicom/sort/utils.py
(3 hunks)src/imgtools/dicom/utils.py
(8 hunks)src/imgtools/logging/__init__.py
(2 hunks)src/imgtools/logging/json_logging.py
(1 hunks)src/imgtools/logging/logging_config.py
(5 hunks)tests/dicom/test_dicom_utils.py
(1 hunks)
💤 Files with no reviewable changes (1)
- README.md
✅ Files skipped from review due to trivial changes (3)
- docs/cli/imgtools.md
- docs/cli/.pages
- src/imgtools/dicom/index/models/init.py
🧰 Additional context used
🪛 Markdownlint (0.35.0)
docs/database_report.md
9-9: null
Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
26-26: null
Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
🔇 Additional comments (17)
docs/.pages (1)
4-4
: LGTM! Navigation structure aligns with new CLI documentation
The addition of the CLI Reference section is well-organized and consistent with the PR objectives.
mkdocs.yml (1)
8-8
: LGTM! Extensions support CLI documentation
The addition of attr_list
and mkdocs-click
extensions enhances markdown capabilities and properly supports CLI documentation generation.
Let's verify the mkdocs-click configuration:
Also applies to: 12-12
pyproject.toml (2)
39-40
: LGTM! Database dependencies align with indexing feature.
The addition of sqlitedict
and sqlalchemy
dependencies with appropriate version constraints aligns well with the PR's objective of implementing database indexing functionality.
Let's verify these are the latest stable versions:
27-27
: Consider maintaining an upper version bound for pydicom.
Removing the upper version constraint (<3
) could lead to unexpected behavior if pydicom introduces breaking changes in future major versions.
Let's check the pydicom release history and compatibility notes:
.gitignore (1)
225-227
: LGTM! Database file patterns are appropriate.
The addition of **/*.db
and **/*.sqlite
patterns will prevent committing database files, which is a good practice for version control.
Let's verify no critical database files exist in the repository that should be tracked:
✅ Verification successful
No database files found in the repository, gitignore patterns are safe to add
The verification confirms there are no existing database files (.db
or .sqlite
) in the repository that would be accidentally ignored by these new patterns. This makes the additions safe and appropriate for preventing future database files from being committed.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any existing database files that might need tracking
echo "Checking for database files:"
fd -e db -e sqlite
Length of output: 87
src/imgtools/dicom/index/database/database.py (1)
98-110
: Optimize retrieval of existing file paths
Fetching all file paths into a list could be memory-intensive for large databases. Consider using a set for faster lookup or processing the files in batches.
[performance]
Apply this optimization:
- @property
- def existing_files(self) -> List[Path]:
- with self.db_handler.session() as session:
- return [Path(file.FilePath) for file in session.query(Image.FilePath).all()]
+ @property
+ def existing_files(self) -> Set[Path]:
+ with self.db_handler.session() as session:
+ return {Path(file.FilePath) for file in session.query(Image.FilePath)}
Also, adjust the usage in build_index_from_files
method to accommodate the set.
existing_files = self.existing_files
- remaining_files = list(set(files) - set(existing_files))
+ remaining_files = list(set(files) - existing_files)
src/imgtools/dicom/index/models/models.py (1)
261-264
:
Use Optional[Dict[str, str]] = None
as default
Using mutable default arguments like dictionaries can lead to unintended side effects due to shared references. It's safer to set the default to None
and assign a new dictionary inside the method.
Apply this change:
Modality: str = field(init=True)
- _RTSTRUCT: Optional[Dict[str, str]] = field(default_factory=dict)
- _MR: Optional[Dict[str, str]] = field(default_factory=dict)
+ _RTSTRUCT: Optional[Dict[str, str]] = field(default=None)
+ _MR: Optional[Dict[str, str]] = field(default=None)
In the code where these attributes are used, ensure to handle the None
case appropriately.
Likely invalid or redundant comment.
src/imgtools/cli/__main__.py (2)
29-29
: Confirm the renaming of the find_dicoms
command
The dicom_finder
command has been replaced with find_dicoms
. Verify that all references to dicom_finder
have been updated accordingly throughout the codebase.
Run the following script to ensure no lingering references exist:
✅ Verification successful
Let me verify if this is just a changelog entry or if there are actual code references by searching for both terms.
No issues found with the command renaming
The only reference to dicom_finder
is in the CHANGELOG.md file, which is expected as it documents the historical changes. The codebase consistently uses find_dicoms
in all Python files, including imports, function definitions, and command registration. The command appears to be properly renamed and all references have been updated.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for references to `dicom_finder`
# Test: Find all occurrences of `dicom_finder`
rg "dicom_finder"
Length of output: 105
Script:
#!/bin/bash
# Search for both old and new command names in Python files
echo "=== Searching for find_dicoms in Python files ==="
rg "find_dicoms" -t py -A 2
echo -e "\n=== Searching for dicom_finder in Python files ==="
rg "dicom_finder" -t py -A 2
# Check the command definition
echo -e "\n=== Checking the command implementation ==="
fd "find_dicoms.py|dicom_finder.py" --exec cat {}
Length of output: 7594
4-5
:
Resolve import conflicts and verify function availability
The set_log_verbosity
function is being imported, but it's also being removed from this module. Ensure that it is correctly defined in imgtools.cli
and that all references are updated accordingly.
Run the following script to verify the availability of set_log_verbosity
:
src/imgtools/logging/__init__.py (2)
14-14
: LGTM: Improved readability with consistent spacing.
The added blank line improves code readability by separating logical sections.
34-39
: LGTM: Improved condition clarity for log level checks.
The new condition using not in (level, DEFAULT_LOG_LEVEL)
is a more elegant way to express the same logic. It's both more readable and maintainable.
config/ruff.toml (1)
8-8
:
Reconsider excluding CLI files from linting
Excluding CLI files from linting could lead to inconsistent code quality. If there are specific linting issues in the CLI code, consider:
- Fixing the issues to maintain consistent code quality
- Adding specific rule exceptions where absolutely necessary
- Creating a timeline for bringing CLI code up to the project's linting standards
Let's check the extent of linting issues in CLI files:
src/imgtools/logging/logging_config.py (1)
169-169
: LGTM! Return type annotation added
Good addition of return type annotation for better type safety and code clarity.
src/imgtools/dicom/sort/utils.py (2)
113-113
: LGTM! Default parameter addition improves flexibility
The addition of the default
parameter enhances the function's flexibility by allowing customization of missing tag values.
183-183
: Verify RTSTRUCT functionality
The RTSTRUCT-specific logic has been removed. Please confirm that this change doesn't break any existing RTSTRUCT file handling.
Also applies to: 186-194
✅ Verification successful
Let me check the RTSTRUCT-related test cases and implementation to better understand the impact of this change.
Let me check how the RTSTRUCT files are handled in the loaders and common I/O modules since they might have special handling that could be affected by this change.
Let me check the actual RTSTRUCT file reading implementation since we see it's handled by read_dicom_rtstruct
.
Based on the evidence gathered, I can now provide a final response:
RTSTRUCT handling remains intact despite commented code
The removed line was just a commented-out code that wasn't being used. The RTSTRUCT functionality is handled properly through dedicated readers (read_dicom_rtstruct
) and proper modality mappings in the I/O layer. The documentation note about RTSTRUCT's InstanceNumber defaulting to '1' is still valid as it's handled at a different layer.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for RTSTRUCT-related code and tests
rg -l "RTSTRUCT" --type py
# Search for any tests specifically handling RTSTRUCT cases
rg -l "test.*rtstruct" --type py -i
Length of output: 564
Script:
#!/bin/bash
# Check test cases for RTSTRUCT handling
rg "RTSTRUCT" tests/test_modalities.py -B 2 -A 2
# Check the implementation in utils.py
rg "RTSTRUCT" src/imgtools/dicom/sort/utils.py -B 2 -A 2
# Check sorter base implementation
rg "RTSTRUCT" src/imgtools/dicom/sort/sorter_base.py -B 2 -A 2
Length of output: 1858
Script:
#!/bin/bash
# Check RTSTRUCT handling in loaders and common I/O
rg "RTSTRUCT" src/imgtools/io/loaders.py -B 2 -A 2
rg "RTSTRUCT" src/imgtools/io/common.py -B 2 -A 2
# Check if there are any special instance number handling for RTSTRUCT
rg "InstanceNumber" src/imgtools/io/ -B 2 -A 2
Length of output: 841
Script:
#!/bin/bash
# Find and check the RTSTRUCT reader implementation
ast-grep --pattern 'def read_dicom_rtstruct'
# Check for any InstanceNumber usage in RTSTRUCT context
rg -B 3 -A 3 "InstanceNumber.*RTSTRUCT|RTSTRUCT.*InstanceNumber" --type py
Length of output: 898
src/imgtools/dicom/utils.py (2)
104-127
: LGTM! Comprehensive examples added
Excellent addition of clear, comprehensive examples that cover all use cases of the function, including the new limit and search_input parameters.
138-141
: LGTM! Proper logging of new parameters
Good practice to include the new parameters in debug logging for better observability.
except Exception as e: | ||
logger.exception(f'Error: {e}') | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid catching broad exceptions without handling
Catching a general Exception
without specific handling can make debugging difficult and may suppress important errors. It's better to catch specific exceptions or re-raise the original exception to preserve the traceback.
Apply this change:
try:
with (
self.db_handler.session() as session,
tqdm.tqdm(
total=len(remaining_files),
initial=len(existing_files),
desc='Indexing DICOM files',
) as tqdm_files,
):
for file_path in remaining_files:
metadata = _extract_metadata(file_path, self.mytags)
_ = self._insert_patient(session, metadata)
_ = self._insert_study(session, metadata)
series = self._insert_series(session, metadata, file_path)
image = self._insert_image(session, metadata, file_path)
series.images.append(image)
tqdm_files.update(1)
- except Exception as e:
- logger.exception(f'Error: {e}')
- raise e
+ except (SomeSpecificException) as e:
+ logger.exception(f'An error occurred during indexing: {e}')
+ raise
Replace SomeSpecificException
with the actual exceptions you expect (e.g., InvalidDicomError
, SQLAlchemyError
).
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worry about this in another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjjermiah, understood. Let me know if you'd like me to open a GitHub issue to track this for a future PR.
src/imgtools/cli/__main__.py
Outdated
def cli(verbose: int, quiet: bool) -> None: | ||
"""A collection of tools for working with medical imaging data.""" | ||
pass | ||
|
||
|
||
cli.add_command(dicomsort) | ||
cli.add_command(dicom_finder, 'find-dicoms') | ||
cli.add_command(find_dicoms, 'find-dicoms') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update decorator usage after altering function signature
After adding the quiet
parameter to the cli
function, ensure that all decorators are appropriately adjusted. The @set_log_verbosity()
decorator may need to accommodate the new parameter.
Apply this change:
@click.group(
no_args_is_help=True,
)
-@set_log_verbosity()
@click.version_option(
version=__version__,
package_name='med-imagetools',
prog_name='imgtools',
message='%(package)s:%(prog)s:%(version)s',
)
@click.help_option(
'-h',
'--help',
)
+@set_log_verbosity()
def cli(verbose: int, quiet: bool) -> None:
"""A collection of tools for working with medical imaging data."""
pass
Ensure that set_log_verbosity
correctly handles the quiet
parameter.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def cli(verbose: int, quiet: bool) -> None: | |
"""A collection of tools for working with medical imaging data.""" | |
pass | |
cli.add_command(dicomsort) | |
cli.add_command(dicom_finder, 'find-dicoms') | |
cli.add_command(find_dicoms, 'find-dicoms') | |
@click.group( | |
no_args_is_help=True, | |
) | |
@click.version_option( | |
version=__version__, | |
package_name='med-imagetools', | |
prog_name='imgtools', | |
message='%(package)s:%(prog)s:%(version)s', | |
) | |
@click.help_option( | |
'-h', | |
'--help', | |
) | |
@set_log_verbosity() | |
def cli(verbose: int, quiet: bool) -> None: | |
"""A collection of tools for working with medical imaging data.""" | |
pass | |
cli.add_command(dicomsort) | |
cli.add_command(find_dicoms, 'find-dicoms') |
if force_delete and db_path.exists(): | ||
logger.debug('Deleting existing database file.', db_path=db_path) | ||
db_path.unlink() # Delete the existing database file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for file operations.
The database file deletion should include error handling for permission issues and file locks.
if force_delete and db_path.exists():
logger.debug('Deleting existing database file.', db_path=db_path)
- db_path.unlink() # Delete the existing database file
+ try:
+ db_path.unlink() # Delete the existing database file
+ except PermissionError as e:
+ logger.error('Permission denied when deleting database file', error=str(e))
+ raise
+ except OSError as e:
+ logger.error('Failed to delete database file', error=str(e))
+ raise
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if force_delete and db_path.exists(): | |
logger.debug('Deleting existing database file.', db_path=db_path) | |
db_path.unlink() # Delete the existing database file | |
if force_delete and db_path.exists(): | |
logger.debug('Deleting existing database file.', db_path=db_path) | |
try: | |
db_path.unlink() # Delete the existing database file | |
except PermissionError as e: | |
logger.error('Permission denied when deleting database file', error=str(e)) | |
raise | |
except OSError as e: | |
logger.error('Failed to delete database file', error=str(e)) | |
raise |
src/imgtools/cli/__init__.py
Outdated
def callback(ctx: click.Context, param: click.Parameter, value: int) -> None: | ||
levels = {0: 'ERROR', 1: 'WARNING', 2: 'INFO', 3: 'DEBUG'} | ||
level = levels.get(value, 'DEBUG') # Default to DEBUG if verbosity is high | ||
|
||
# Check if `--quiet` is passed | ||
if ctx.params.get('quiet', False): | ||
logger.setLevel(ERROR) | ||
return | ||
|
||
levelvalue = getLevelName(level) | ||
env_level = logger.level | ||
|
||
# Override environment variable if verbosity level is passed | ||
if levelvalue > env_level and value != 0: | ||
logger.warning( | ||
f'Environment variable {logger.name.upper()}_LOG_LEVEL is {getLevelName(env_level)} but ' | ||
f'you are setting it to {getLevelName(levelvalue)}' | ||
) | ||
logger.setLevel(levelvalue) | ||
else: | ||
logger.setLevel(min(levelvalue, env_level)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Simplify callback logic and add complete type hints.
The callback function could be simplified and made more type-safe.
-def callback(ctx: click.Context, param: click.Parameter, value: int) -> None:
+def callback(ctx: click.Context, param: click.Parameter, verbosity: int) -> None:
levels = {0: 'ERROR', 1: 'WARNING', 2: 'INFO', 3: 'DEBUG'}
- level = levels.get(value, 'DEBUG') # Default to DEBUG if verbosity is high
+ level = levels.get(min(verbosity, max(levels.keys())), 'DEBUG')
# Check if `--quiet` is passed
if ctx.params.get('quiet', False):
logger.setLevel(ERROR)
return
levelvalue = getLevelName(level)
env_level = logger.level
# Override environment variable if verbosity level is passed
- if levelvalue > env_level and value != 0:
+ if levelvalue > env_level and verbosity != 0:
logger.warning(
f'Environment variable {logger.name.upper()}_LOG_LEVEL is {getLevelName(env_level)} but '
f'you are setting it to {getLevelName(levelvalue)}'
)
logger.setLevel(levelvalue)
else:
logger.setLevel(min(levelvalue, env_level))
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def callback(ctx: click.Context, param: click.Parameter, value: int) -> None: | |
levels = {0: 'ERROR', 1: 'WARNING', 2: 'INFO', 3: 'DEBUG'} | |
level = levels.get(value, 'DEBUG') # Default to DEBUG if verbosity is high | |
# Check if `--quiet` is passed | |
if ctx.params.get('quiet', False): | |
logger.setLevel(ERROR) | |
return | |
levelvalue = getLevelName(level) | |
env_level = logger.level | |
# Override environment variable if verbosity level is passed | |
if levelvalue > env_level and value != 0: | |
logger.warning( | |
f'Environment variable {logger.name.upper()}_LOG_LEVEL is {getLevelName(env_level)} but ' | |
f'you are setting it to {getLevelName(levelvalue)}' | |
) | |
logger.setLevel(levelvalue) | |
else: | |
logger.setLevel(min(levelvalue, env_level)) | |
def callback(ctx: click.Context, param: click.Parameter, verbosity: int) -> None: | |
levels = {0: 'ERROR', 1: 'WARNING', 2: 'INFO', 3: 'DEBUG'} | |
level = levels.get(min(verbosity, max(levels.keys())), 'DEBUG') | |
# Check if `--quiet` is passed | |
if ctx.params.get('quiet', False): | |
logger.setLevel(ERROR) | |
return | |
levelvalue = getLevelName(level) | |
env_level = logger.level | |
# Override environment variable if verbosity level is passed | |
if levelvalue > env_level and verbosity != 0: | |
logger.warning( | |
f'Environment variable {logger.name.upper()}_LOG_LEVEL is {getLevelName(env_level)} but ' | |
f'you are setting it to {getLevelName(levelvalue)}' | |
) | |
logger.setLevel(levelvalue) | |
else: | |
logger.setLevel(min(levelvalue, env_level)) |
src/imgtools/cli/dicomfind.py
Outdated
dicom_files = find_dicoms_util( | ||
directory=path, | ||
check_header=check_header, | ||
recursive=True, | ||
extension=extension, | ||
limit=limit, # Pass limit parameter | ||
search_input=search_input, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add validation for search input patterns
Consider validating regex patterns before passing them to find_dicoms_util to provide better error messages.
+ # Validate regex patterns
+ for pattern in search_input:
+ try:
+ re.compile(pattern)
+ except re.error as e:
+ logger.error(f'Invalid regex pattern "{pattern}": {str(e)}')
+ return
+
dicom_files = find_dicoms_util(
directory=path,
check_header=check_header,
recursive=True,
extension=extension,
limit=limit,
search_input=search_input,
)
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjjermiah, understood. I've noted this information for future reviews.
✏️ Learnings added
Learnt from: jjjermiah
PR: bhklab/med-imagetools#148
File: src/imgtools/cli/dicomfind.py:96-0
Timestamp: 2024-12-03T16:32:37.035Z
Learning: In the `find_dicoms` function in `src/imgtools/cli/dicomfind.py`, validation of regex patterns before passing them to `find_dicoms_util` is not required.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #148 +/- ##
==========================================
- Coverage 47.19% 43.46% -3.74%
==========================================
Files 41 48 +7
Lines 3348 3640 +292
==========================================
+ Hits 1580 1582 +2
- Misses 1768 2058 +290 ☔ View full report in Codecov by Sentry. |
Summary by CodeRabbit
Release Notes
New Features
imgtools
and a beta featureDICOMSort
.find_dicoms
function to support limiting results and filtering by search input.Documentation
Bug Fixes
Chores