-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory optimization for image chunk preparation #8581
Conversation
WalkthroughThe changes in this pull request primarily involve the refactoring of image loading functionalities across three files: Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client
participant TC as Task Creator
participant MC as Media Cache
participant U as Utils
C->>TC: Request to create task
TC->>MC: Call load_images(preload=True)
MC->>U: Load images from paths
U-->>MC: Return loaded images
MC-->>TC: Provide images for task
TC-->>C: Task created successfully
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (5)
cvat/apps/engine/utils.py (3)
366-369
: LGTM! Consider improving type hints.The renaming from
preload_image
toload_image
is consistent with the changes in other parts of the codebase. The function's implementation remains correct and unchanged.Consider using more specific type hints for improved clarity:
def load_image(image: tuple[str, str, str]) -> tuple[Image.Image, str, str]:
371-375
: LGTM! Consider adding type hints and docstring.The changes to the
load_images
function improve its flexibility and potential memory efficiency. The newpreload
parameter allows for both eager and lazy loading of images.Consider the following improvements:
- Add type hints for better clarity:
def load_images(images: Iterable[tuple[str, str, str]], preload: bool = False) -> Union[List[tuple[Image.Image, str, str]], Iterable[tuple[Image.Image, str, str]]]:
- Add a docstring to explain the function's behavior and the purpose of the
preload
parameter:def load_images(images: Iterable[tuple[str, str, str]], preload: bool = False) -> Union[List[tuple[Image.Image, str, str]], Iterable[tuple[Image.Image, str, str]]]: """ Load images from the given iterable of image data. Args: images: An iterable of tuples containing image data (file path, format, mime type). preload: If True, eagerly load all images into memory. If False, return a lazy-loading iterable. Returns: If preload is True, returns a list of tuples (Image, format, mime type). If preload is False, returns an iterable of tuples (Image, format, mime type). """These changes will improve the function's documentation and make it easier for other developers to understand and use the function correctly.
366-375
: Overall assessment: Changes align with PR objectives and improve code flexibility.The modifications to
load_image
andload_images
functions are consistent with the PR's goal of optimizing memory usage during image chunk preparation. The newpreload
parameter inload_images
allows for more efficient memory management when handling large datasets.These changes appear to be correct and do not introduce any security issues. The suggestions for improved type hinting and documentation will further enhance the code quality and maintainability.
Consider reviewing the impact of these changes on the overall image loading process in the application. Ensure that other parts of the codebase that use these functions are updated accordingly, especially regarding the new lazy-loading behavior when
preload=False
.cvat/apps/engine/cache.py (2)
361-361
: Approve memory optimization in frame loadingThe change in this line implements the same memory optimization as in the
_read_raw_images
method, usingload_images
withpreload=False
. This should reduce memory usage when reading frames.For consistency, consider extracting the
preload=False
parameter into a constant or configuration value, as it's used in multiple places.
Line range hint
1-724
: Focused optimization with potential for future improvementsThe changes in this file are focused on optimizing memory usage in image and frame loading processes. This targeted approach is good for minimizing the risk of introducing bugs in unrelated functionality.
For future improvements, consider:
- Applying similar memory optimizations to other methods that handle large amounts of data.
- Reviewing the
prepare_chunk
andprepare_context_images_chunk
methods for potential memory optimizations.- Implementing a more comprehensive caching strategy that balances memory usage and performance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- cvat/apps/engine/cache.py (3 hunks)
- cvat/apps/engine/task.py (2 hunks)
- cvat/apps/engine/utils.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (3)
cvat/apps/engine/cache.py (2)
57-57
: Update import statement to reflect new function nameThe import statement has been updated to use
load_images
instead ofpreload_images
. This change aligns with the refactoring of the image loading functionality.
324-331
: Approve memory optimization in image loadingThe changes in this segment implement the memory optimization mentioned in the PR objectives. By using
load_images
withpreload=False
, the method avoids loading all images into memory simultaneously, which should significantly reduce memory usage for large datasets.The checksum verification logic has been updated accordingly to work with the new image loading mechanism.
cvat/apps/engine/task.py (1)
41-41
: Importload_images
for optimized image loadingThe addition of
load_images
to the imports enables efficient image loading later in the code, which is essential for memory optimization during chunk preparation.
cvat/apps/engine/task.py
Outdated
@@ -1537,7 +1537,7 @@ def save_chunks( | |||
MEDIA_TYPES['archive']['extractor'], | |||
)) | |||
): | |||
chunk_data = preload_images(chunk_data) | |||
chunk_data = load_images(images=chunk_data, preload=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for load_images
to ensure robustness
The call to load_images
may raise exceptions if image loading fails due to corrupted files or I/O errors. It is recommended to add error handling to manage potential exceptions and maintain the robustness of the application.
Apply this diff to handle possible exceptions:
def save_chunks(
executor: concurrent.futures.ThreadPoolExecutor,
db_segment: models.Segment,
chunk_idx: int,
chunk_frame_ids: Sequence[int]
):
chunk_data = [media_iterator[frame_idx] for frame_idx in chunk_frame_ids]
+ try:
if (
db_task.dimension == models.DimensionType.DIM_2D and
isinstance(media_extractor, (
MEDIA_TYPES['image']['extractor'],
MEDIA_TYPES['zip']['extractor'],
MEDIA_TYPES['pdf']['extractor'],
MEDIA_TYPES['archive']['extractor'],
))
):
chunk_data = load_images(images=chunk_data, preload=True)
+ except Exception as e:
+ slogger.glob.error(f"Failed to load images for chunk {chunk_idx}: {e}")
+ raise
# TODO: extract into a class
fs_original = executor.submit(
original_chunk_writer.save_as_chunk,
images=chunk_data,
chunk_path=db_data.get_original_segment_chunk_path(
chunk_idx, segment_id=db_segment.id
),
)
Committable suggestion was skipped due to low confidence.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8581 +/- ##
===========================================
+ Coverage 74.24% 74.28% +0.04%
===========================================
Files 403 403
Lines 43287 43284 -3
Branches 3914 3914
===========================================
+ Hits 32137 32155 +18
+ Misses 11150 11129 -21
|
Add a changelog entry, please. |
Quality Gate passedIssues Measures |
There is no need to load all images into memory during chunk preparation
Motivation and context
How has this been tested?
Checklist
develop
branch(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation