-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce import memory consumption #8223
Conversation
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe recent changes enhance the efficiency and clarity of the codebase by improving data handling during shape importation, optimizing deletion processes, and introducing a new Changes
Sequence Diagram(s)sequenceDiagram
participant A as User
participant B as TaskManager
participant C as AnnotationHandler
A->>B: Request delete(data)
B->>B: Check if data is empty
B->>C: handle_annotations_change(annotations, action, should_deepcopy)
C-->>B: Update annotations
B-->>A: Confirm deletion
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (1)
cvat/apps/events/handlers.py (1)
376-377
: Function calls tohandle_annotations_change
need to be updated to match the new signature.The following occurrences do not include the
should_deepcopy
argument:
cvat/apps/dataset_manager/task.py: handle_annotations_change(self.db_job, self.data, "create")
cvat/apps/dataset_manager/task.py: handle_annotations_change(self.db_job, deleted_data, "delete")
cvat/apps/dataset_manager/task.py: handle_annotations_change(self.db_job, self.data, "create")
cvat/apps/dataset_manager/task.py: handle_annotations_change(self.db_job, self.data, "update")
cvat/apps/dataset_manager/task.py: handle_annotations_change(self.db_job, deleted_data, "delete", should_deepcopy=False)
Please update these calls to include the
should_deepcopy
argument as appropriate.Analysis chain
LGTM! But verify the function usage in the codebase.
The code changes are approved.
However, ensure that all function calls to
handle_annotations_change
match the new signature.
Let's correct the
rg
command to use the appropriate file type for Python and re-run the search to verify the function usage.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `handle_annotations_change` match the new signature. # Test: Search for the function usage. Expect: Only occurances of the new signature. rg --type python -A 5 $'handle_annotations_change'Length of output: 80
Script:
#!/bin/bash # Description: Verify all function calls to `handle_annotations_change` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg --type py -A 5 'handle_annotations_change'Length of output: 3017
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- cvat/apps/dataset_manager/bindings.py (2 hunks)
- cvat/apps/dataset_manager/formats/cvat.py (2 hunks)
- cvat/apps/dataset_manager/task.py (2 hunks)
- cvat/apps/engine/models.py (3 hunks)
- cvat/apps/events/handlers.py (1 hunks)
Additional comments not posted (13)
cvat/apps/dataset_manager/task.py (2)
480-484
: LGTM! But verify the method usage in the codebase.The code changes are approved.
However, ensure that all method calls to
delete
handle the updated control flow correctly.
Line range hint
570-570
:
LGTM! But verify the function usage in the codebase.The code changes are approved.
However, ensure that all function calls to
import_task_annotations
handle the updated process correctly.cvat/apps/engine/models.py (2)
189-289
: LGTM!The
LazyList
class is well-implemented and enhances memory efficiency through lazy evaluation of elements.
Line range hint
290-316
:
LGTM!The changes in the
AbstractArrayField
class enhance data handling efficiency by incorporating theLazyList
for lazy evaluation of elements.cvat/apps/dataset_manager/formats/cvat.py (7)
1226-1229
: Ensure consistent float conversion for box pointsThe conversion to float for box points is correctly implemented. This ensures numerical accuracy during calculations.
1231-1234
: Ensure consistent float conversion for ellipse pointsThe conversion to float for ellipse points is correctly implemented. This ensures numerical accuracy during calculations.
1236-1252
: Ensure consistent float conversion for cuboid pointsThe conversion to float for cuboid points is correctly implemented. This ensures numerical accuracy during calculations.
1285-1288
: Ensure consistent float conversion for box pointsThe conversion to float for box points is correctly implemented. This ensures numerical accuracy during calculations.
1290-1293
: Ensure consistent float conversion for ellipse pointsThe conversion to float for ellipse points is correctly implemented. This ensures numerical accuracy during calculations.
1295-1299
: Ensure consistent float conversion for mask pointsThe conversion to float for mask points is correctly implemented. This ensures numerical accuracy during calculations.
1301-1317
: Ensure consistent float conversion for cuboid pointsThe conversion to float for cuboid points is correctly implemented. This ensures numerical accuracy during calculations.
cvat/apps/dataset_manager/bindings.py (2)
Line range hint
438-438
:
Verify the downstream impact of removing float conversion for track points.The line
shape['points'] = list(map(float, shape['points']))
has been removed. Ensure that downstream processes handle the points correctly without this conversion.
Line range hint
394-394
:
Verify the downstream impact of removing float conversion for shape points.The line
_shape['points'] = list(map(float, _shape['points']))
has been removed. Ensure that downstream processes handle the points correctly without this conversion.
cvat/apps/events/handlers.py
Outdated
@@ -373,8 +373,8 @@ def handle_delete(scope, instance, store_in_deletion_cache=False, **kwargs): | |||
user_email=uemail, | |||
) | |||
|
|||
def handle_annotations_change(instance, annotations, action, **kwargs): | |||
_annotations = deepcopy(annotations) | |||
def handle_annotations_change(instance, annotations, action, should_deepcopy=True, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this breaks encapsulation of the function, if a parameter with such name is required to be specified by the client code. I can see several ways to improve it:
- rename to "annotations_mutable" or add a kw-parameter "mutable_annotations" (mutually exclusive with "annotations")
- infer it automatically inside the function, if the operation is "delete". The problem here is this code is not responsible for this logic. The same problem is with the implemented passing of
should_deepcopy=False
in thedelete()
call - it looks like it actually must be forwarded all the way from the place the class instance is created to be fully correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function may be just refactored to avoid modifying its arguments (I mean even without deepcopy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now it still will be copying, but without calling deepcopy
. Do we really need to send all these annotations, can it be replaced by some summary? Can we modify the recipient side to get only required fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now it still will be copying, but without calling deepcopy
The main difference here is that we're only copying outer layers of the data, while nested fields remain the same objects. I'd say this has negligible impact on memory/performance, especially compared to previous implementation.
Do we really need to send all these annotations, can it be replaced by some summary? Can we modify the recipient side to get only required fields?
While I think it certainly worth a discussion, but I don't have enough context to answer these questions yet.
cvat/apps/events/handlers.py
Outdated
def handle_annotations_change(instance, annotations, action, **kwargs): | ||
_annotations = deepcopy(annotations) | ||
def handle_annotations_change(instance, annotations, action, should_deepcopy=True, **kwargs): | ||
_annotations = deepcopy(annotations) if should_deepcopy else annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion we are trying to put into event too many information in this function, maybe that is worthy of discussing
Handle list modifications and throw away used up string.
cvat/apps/events/handlers.py
Outdated
|
||
return data | ||
shapes = data.get("shates", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shates?
cvat/apps/events/handlers.py
Outdated
track["shapes"].append(filter_shape_data(track_shape)) | ||
tracks_by_type[track_shapes[0]["type"]].append(track) | ||
for track in annotations.get("tracks", []): | ||
filtered_track = filter_data(track) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be safer by concept to split the logic into several functions:
filter_track
filter_tag
filter_shape
As these 3 data structures are different by their idea.
And left one more common filter_data
copying (id, attributes, frame as common properties).
But that is all about refactoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've added filter_track
and filter_shape
, adding filter_tag
didn't make sense since it would be identical to filter_data
:
def filter_tag(tag):
return filter_data(tag)
Quality Gate passedIssues Measures |
Closing in favour of 3 separate PRs:
|
<!-- Raise an issue to propose your change (https://github.com/cvat-ai/cvat/issues). It helps to avoid duplication of efforts from multiple independent contributors. Discuss your ideas with maintainers to be sure that changes will be approved and merged. Read the [Contribution guide](https://docs.cvat.ai/docs/contributing/). --> <!-- Provide a general summary of your changes in the Title above --> ### Motivation and context Decided to split changes in this PR: #8223 1. Annotations import (#8226) 2. Array fields optimization (#8229) 3. Logging function optimization (this PR) ### How has this been tested? <!-- Please describe in detail how you tested your changes. Include details of your testing environment, and the tests you ran to see how your change affects other areas of the code, etc. --> ### Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply. If an item isn't applicable for some reason, then ~~explicitly strikethrough~~ the whole line. If you don't do that, GitHub will show incorrect progress for the pull request. If you're unsure about any of these, don't hesitate to ask. We're here to help! --> - [x] I submit my changes into the `develop` branch - [ ] I have created a changelog fragment <!-- see top comment in CHANGELOG.md --> - [ ] I have updated the documentation accordingly - [ ] I have added tests to cover my changes - [ ] I have linked related issues (see [GitHub docs]( https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)) - [ ] I have increased versions of npm packages if it is necessary ([cvat-canvas](https://github.com/cvat-ai/cvat/tree/develop/cvat-canvas#versioning), [cvat-core](https://github.com/cvat-ai/cvat/tree/develop/cvat-core#versioning), [cvat-data](https://github.com/cvat-ai/cvat/tree/develop/cvat-data#versioning) and [cvat-ui](https://github.com/cvat-ai/cvat/tree/develop/cvat-ui#versioning)) ### License - [x] I submit _my code changes_ under the same [MIT License]( https://github.com/cvat-ai/cvat/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Improved data integrity by ensuring annotations change handling occurs every time an item is deleted. - **New Features** - Enhanced data filtering logic for annotations, allowing for more efficient processing without unnecessary deep copies. - Introduced a new function for streamlined filtering of track data, simplifying data management. - **Refactor** - Consolidated data filtering into reusable functions for better maintainability and clarity. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Raise an issue to propose your change (https://github.com/cvat-ai/cvat/issues). It helps to avoid duplication of efforts from multiple independent contributors. Discuss your ideas with maintainers to be sure that changes will be approved and merged. Read the [Contribution guide](https://docs.cvat.ai/docs/contributing/). --> <!-- Provide a general summary of your changes in the Title above --> ### Motivation and context Decided to split changes in this PR: #8223 1. Annotations import (this PR) 2. Array fields optimization (#8229) 3. Logging function optimization (#8228) ### How has this been tested? <!-- Please describe in detail how you tested your changes. Include details of your testing environment, and the tests you ran to see how your change affects other areas of the code, etc. --> ### Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply. If an item isn't applicable for some reason, then ~~explicitly strikethrough~~ the whole line. If you don't do that, GitHub will show incorrect progress for the pull request. If you're unsure about any of these, don't hesitate to ask. We're here to help! --> - [x] I submit my changes into the `develop` branch - [ ] I have created a changelog fragment <!-- see top comment in CHANGELOG.md --> - [ ] I have updated the documentation accordingly - [ ] I have added tests to cover my changes - [ ] I have linked related issues (see [GitHub docs]( https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)) - [ ] I have increased versions of npm packages if it is necessary ([cvat-canvas](https://github.com/cvat-ai/cvat/tree/develop/cvat-canvas#versioning), [cvat-core](https://github.com/cvat-ai/cvat/tree/develop/cvat-core#versioning), [cvat-data](https://github.com/cvat-ai/cvat/tree/develop/cvat-data#versioning) and [cvat-ui](https://github.com/cvat-ai/cvat/tree/develop/cvat-ui#versioning)) ### License - [x] I submit _my code changes_ under the same [MIT License]( https://github.com/cvat-ai/cvat/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Improved data type consistency for shape points during the annotation loading process, ensuring all coordinates are treated as numeric values. - Enhanced clarity and reliability in loading various shape types (box, ellipse, cuboid, mask) by converting relevant attributes to floats. - **Bug Fixes** - Removed redundant database initialization calls during annotation import, potentially improving the efficiency of the import process. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Raise an issue to propose your change (https://github.com/cvat-ai/cvat/issues). It helps to avoid duplication of efforts from multiple independent contributors. Discuss your ideas with maintainers to be sure that changes will be approved and merged. Read the [Contribution guide](https://docs.cvat.ai/docs/contributing/). --> <!-- Provide a general summary of your changes in the Title above --> ### Motivation and context Decided to split changes in this PR: #8223 1. Annotations import (#8226) 2. Array fields optimization (this PR) 3. Logging function optimization (#8228) ### How has this been tested? <!-- Please describe in detail how you tested your changes. Include details of your testing environment, and the tests you ran to see how your change affects other areas of the code, etc. --> ### Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply. If an item isn't applicable for some reason, then ~~explicitly strikethrough~~ the whole line. If you don't do that, GitHub will show incorrect progress for the pull request. If you're unsure about any of these, don't hesitate to ask. We're here to help! --> - [x] I submit my changes into the `develop` branch - [ ] I have created a changelog fragment <!-- see top comment in CHANGELOG.md --> - [ ] I have updated the documentation accordingly - [ ] I have added tests to cover my changes - [ ] I have linked related issues (see [GitHub docs]( https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)) - [ ] I have increased versions of npm packages if it is necessary ([cvat-canvas](https://github.com/cvat-ai/cvat/tree/develop/cvat-canvas#versioning), [cvat-core](https://github.com/cvat-ai/cvat/tree/develop/cvat-core#versioning), [cvat-data](https://github.com/cvat-ai/cvat/tree/develop/cvat-data#versioning) and [cvat-ui](https://github.com/cvat-ai/cvat/tree/develop/cvat-ui#versioning)) ### License - [x] I submit _my code changes_ under the same [MIT License]( https://github.com/cvat-ai/cvat/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced the `LazyList` class for efficient, on-demand parsing of list elements from strings. - Added support for custom transformations through a converter function. - Enhanced lazy evaluation with new decorators for improved performance on list operations. - **Tests** - Implemented a comprehensive test suite for the `LazyList` class, validating core functionalities and ensuring robustness. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Motivation and context
memray stats for develop
Total time: 00:10:23
Peak memory usage: 4.2 GiB
memray stats for this branch
Total time: 00:05:55
Peak memory usage: 1.3 GiB
Check out flamegraphs.zip for full report.
Not yet entirely sure changes in 8965996 are sufficient. Was operating on the assumption that other importers are converting points to floats on import. Can confirm that this is the case for datumaro format.
How has this been tested?
Checklist
develop
branch(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit
New Features
LazyList
class for efficient data handling and lazy evaluation of elements.Improvements
Bug Fixes