-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mask_path
to fo.Detection
labels
#4693
Add mask_path
to fo.Detection
labels
#4693
Conversation
WalkthroughThe recent updates to the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Detection
participant Disk
User->>Detection: Create Detection with mask_path
Detection->>Disk: Import mask from mask_path
Detection->>User: Mask available for use
User->>Detection: Request mask
Detection->>Detection: Get mask via get_mask()
Detection->>User: Provide mask
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
Early access features: disabledWe are currently testing the following features in early access:
Note:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (4)
- fiftyone/core/labels.py (2 hunks)
- fiftyone/utils/coco.py (2 hunks)
- fiftyone/utils/cvat.py (1 hunks)
- fiftyone/utils/eta.py (1 hunks)
Additional comments not posted (12)
fiftyone/utils/eta.py (1)
599-599
: Use ofget_mask()
enhances encapsulation.The change from accessing
detection.mask
directly to usingdetection.get_mask()
improves encapsulation and potentially adds validation or processing. Ensure thatget_mask()
is correctly implemented to handle all necessary cases.fiftyone/core/labels.py (8)
404-405
: Introduction ofmask_path
enhances flexibility.The addition of
mask_path
allows masks to be stored on disk, reducing database load. Ensure that paths are handled securely to prevent path traversal vulnerabilities.
412-413
: Declaration of_MEDIA_FIELD
improves media handling.Setting
_MEDIA_FIELD
to"mask_path"
indicates a shift towards handling media files, aligning with the newmask_path
feature.
421-424
:has_mask
property improves mask presence checks.This property efficiently checks for the presence of a mask, whether in memory or on disk. Ensure it is used consistently throughout the class.
426-438
:get_mask
method centralizes mask retrieval.Using
get_mask
to access the mask encapsulates the logic for retrieving masks from memory or disk. Verify that_read_mask
handles file reading securely and efficiently.
440-454
:import_mask
method facilitates mask importation.This method allows importing masks from disk into memory, with an option to clear
mask_path
. Ensure_read_mask
is robust against file-related errors and thatmask_path
is cleared only when intended.
455-472
:export_mask
method supports mask exportation.The method exports masks to disk, updating attributes as needed. Ensure that the file operations handle errors gracefully and that path updates are consistent.
473-514
:transform_mask
method enables mask transformation.This method provides flexibility in transforming masks, with options to save changes. Ensure that
_transform_mask
handles transformations correctly and that file operations are secure.
570-571
:to_segmentation
method utilizesget_mask
.Updating this method to use
get_mask
ensures consistent mask retrieval. Ensure that error handling is robust for cases where masks are not available.fiftyone/utils/coco.py (2)
1302-1302
: Good use of encapsulation withget_mask()
.The change to use
label.get_mask()
instead of direct attribute access improves encapsulation and maintainability. Ensure that theget_mask
method is correctly implemented to handle all necessary logic for mask retrieval.
2113-2113
: Improved maintainability withget_mask()
.Using
detection.get_mask()
instead of direct attribute access centralizes mask retrieval logic, which is beneficial for maintainability. Verify that theget_mask
method is implemented to handle all relevant cases.fiftyone/utils/cvat.py (1)
6403-6403
: Good encapsulation practice withget_mask()
.The use of
det.get_mask()
instead of directly accessingdet.mask
improves encapsulation and maintainability by potentially incorporating additional logic within the getter method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Laurent2916! 👋
Looks like you're interested in working with detection tasks where the object masks are stored as images on disk rather than as numpy arrays in the database.
Out of curiosity, how big are your instance masks in your use cases? In terms of resolution, number of objects per image, etc.
From a data modeling precedent standpoint, we have another closely related proposal in #4500. In that case, the objective is also to work with detection datasets defined by on-disk images rather than in-database arrays, but there is an additional assumption that the objects are strictly non-overlapping and thus it is proposed to be achieved by adding a single full-resolution mask at the Detections.mask|mask_path
level.
There's a quite detailed discussion there about how in the author's opinion, object detections represented in that way should be distinct from the existing Segmentation
label type, which is identical from a data storage standpoint (single per-image mask) but has different semantics in terms of the pixel values representing instances rather than classes.
I'm pointing out #4500 because, while we could in principle have all three of these:
Segmentation.mask_path
Detections.mask_path
Detection.mask_path
the added representational flexibility incurs extra complexity in making the rest of fiftyone
work (in-App visualization, import/export, additional complexities to support cloud-backed datasets in FiftyOne Teams, etc).
confidence (None): a confidence in ``[0, 1]`` for the detection | ||
index (None): an index for the object | ||
attributes ({}): a dict mapping attribute names to :class:`Attribute` | ||
instances | ||
""" | ||
|
||
_MEDIA_FIELD = "mask_path" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to Voxel51, this would be the first instance of a media field that could be embedded in a list (Detections
), which means these methods will need to be updated:
fiftyone/fiftyone/utils/data/exporters.py
Line 2345 in 1824ddb
def _export_media_field(self, d, field_name, key=None): fiftyone/fiftyone/utils/data/exporters.py
Line 2041 in 1824ddb
def _export_media_field(self, d, field_name, key=None):
There could be more such places too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (3)
- fiftyone/core/labels.py (2 hunks)
- fiftyone/utils/coco.py (2 hunks)
- fiftyone/utils/cvat.py (1 hunks)
Additional comments not posted (8)
fiftyone/core/labels.py (5)
421-424
: LGTM!The code changes are approved.
426-438
: LGTM!The code changes are approved.
440-453
: LGTM!The code changes are approved.
455-471
: LGTM!The code changes are approved.
527-528
: LGTM!The code changes are approved.
fiftyone/utils/coco.py (2)
1302-1302
: Addressed previous comment.The code has been updated to use
label.has_mask()
as suggested in the previous comment for better performance.
2113-2113
: LGTM!The code changes are approved. The check ensures that the detection is included only if segmentations are not being loaded or if the detection has a mask.
fiftyone/utils/cvat.py (1)
6403-6403
: Usehas_mask()
method to check for detection masks.Good update to use the
has_mask()
method here instead of directly checking themask
attribute. This encapsulates the logic for determining mask existence.
Hi @brimoor! Thanks for the suggestions, I've applied them. I'm working with ~1000x1000px masks on average. I have about 30k images for now, with each image containing 2-4 masks. This means that currently, masks are responsible for a good chunk of the disk usage of my mongo db. The masks I have aren't "non-overlapping", they are inferred from various segmentation models and thus overlap most of the time (they are also non-binary/grayscale, even though the UI doesn't support displaying this). So using |
mask_path
to fo.Detection
labels
Hi @brimoor, when you get a chance, could you please take a look again at this PR and let me know your thoughts ? |
Hi @Laurent2916, we'll continue this in #5120 |
5d2dedc
to
887fb57
Compare
7e2e35e
into
voxel51:feat/on-disk-detection-mask-path
What changes are proposed in this pull request?
Fix #4486
How is this patch tested? If it is not, please explain why.
I genuinely couldn't figure out how to run the tests, I followed the instructions but kept getting errors when using
pytest
, so I don't know if I have inadvertently broken something.I wrote a quick snippet to test the feature instead:
Release Notes
Is this a user-facing change that should be mentioned in the release notes?
Allows
Detection
s to store masks on disk, this allows to offload some disk usage from the mongo database to somewhere else. See also #4486What areas of FiftyOne does this PR affect?
fiftyone
Python library changesSummary by CodeRabbit
New Features
Bug Fixes