Support on-disk instance segmentations in SDK #5256

brimoor · 2024-12-11T06:34:13Z

import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.utils.labels as foul

dataset = foz.load_zoo_dataset(
    "coco-2017",
    split="validation",
    label_types="segmentations",
    classes=["cat", "dog"],
    label_field="instances",
    max_samples=10,
    only_matching=True,
)

foul.export_segmentations(
    dataset,
    "instances",
    "/tmp/instances/masks",
)

assert dataset.count("instances.detections.mask") == 0
assert dataset.count("instances.detections.mask_path") > 0

dataset.export(
    export_dir="/tmp/instances/fod",
    dataset_type=fo.types.FiftyOneDataset,
    # export_media=False,
)

dataset2 = fo.Dataset.from_dir(
    dataset_dir="/tmp/instances/fod",
    dataset_type=fo.types.FiftyOneDataset,
)

assert dataset2.count("instances.detections.mask") == 0
assert dataset2.count("instances.detections.mask_path") > 0

foul.import_segmentations(dataset2, "instances")

assert dataset2.count("instances.detections.mask") > 0
assert dataset2.count("instances.detections.mask_path") == 0

Summary by CodeRabbit

New Features
- Enhanced documentation for FiftyOne datasets, including new sections and clearer examples.
- Improved functionality for dataset exporters and importers, supporting additional label types.
- Added new test methods for validating instance segmentation functionality.
Bug Fixes
- Clarified requirements for segmentation processes and improved error handling in related methods.
Documentation
- Expanded and refined sections on instance segmentation, dataset deletion, and dynamic attributes for better user understanding.
Tests
- Introduced new unit tests for instance segmentation to ensure robust functionality.

coderabbitai · 2024-12-11T06:34:22Z

Walkthrough

The pull request introduces significant updates to the FiftyOne documentation and core functionalities. Key changes include enhancements to the using_datasets.rst documentation, focusing on instance segmentation, dataset deletion, and dynamic attributes. Modifications in the SampleCollection class improve field handling, while the Detection class clarifies mask usage. Additionally, the dataset exporters and importers have been refactored for better data manipulation. New tests for instance segmentation functionality have also been added, ensuring robust validation of these features.

Changes

File	Change Summary
`docs/source/user_guide/using_datasets.rst`	Added new sections and examples, refined instance segmentation details, expanded dataset deletion section, updated dynamic attributes guidance, and improved overall structure.
`fiftyone/core/collections.py`	Enhanced field handling in `SampleCollection`, updated methods for media and label field processing.
`fiftyone/core/labels.py`	Updated docstring for `mask_path`, modified error messages in `to_segmentation` method to clarify requirements.
`fiftyone/utils/data/exporters.py`	Refactored media field export logic, improved media export path handling, and refined error handling.
`fiftyone/utils/data/importers.py`	Integrated `pydash` for nested data handling, added `_parse_nested_media_field` function for better media field parsing.
`fiftyone/utils/labels.py`	Enhanced segmentation and detection utilities, updated validation logic in export/import functions to support new label types.
`tests/unittests/import_export_tests.py`	Added new tests for instance segmentation functionality, validating export and import processes for segmentation datasets.

Possibly related PRs

Add mask_path to fo.Detection labels #4693: This PR adds the mask_path attribute to Detection labels, which is directly related to the main PR's updates on how masks can be stored and referenced in the documentation.
add support for detection.mask_path #5120: This PR introduces support for detection.mask_path, enhancing the handling of masks in the Detection class, which aligns with the main PR's focus on improving documentation regarding mask management.
quickstart 3d dataset #4406: This PR includes updates to the using_datasets.rst file, which is the same file modified in the main PR, indicating a connection in the context of dataset usage documentation.

Suggested labels

feature, documentation

Suggested reviewers

swheaton
benjaminpkane

Poem

In the fields of data bright,
A rabbit hops with pure delight.
With masks and segments, clear and neat,
Our datasets now can’t be beat!
So let’s export and import with cheer,
For FiftyOne’s changes bring us near! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (7)

fiftyone/utils/labels.py (3)
214-225: Refactor suggestion: Combine if branches to reduce redundancy

The if statements for fol.Segmentation and fol.Detection types share the same logic when label.mask is not None. You can combine these branches using a tuple in the isinstance check to simplify the code.

Refactored code:
if isinstance(label, (fol.Segmentation, fol.Detection)):
    if label.mask is not None:
        outpath = filename_maker.get_output_path(
            image.filepath, output_ext=".png"
        )
        label.export_mask(outpath, update=update)
elif isinstance(label, fol.Detections):
    for detection in label.detections:
        if detection.mask is not None:
            outpath = filename_maker.get_output_path(
                image.filepath, output_ext=".png"
            )
            detection.export_mask(outpath, update=update)
elif isinstance(label, fol.Heatmap):
    if label.map is not None:
        outpath = filename_maker.get_output_path(
            image.filepath, output_ext=".png"
        )
        label.export_map(outpath, update=update)
🧰 Tools

🪛 Ruff (0.8.0)

214-225: Combine if branches using logical or operator

Combine if branches

(SIM114)

288-299: Refactor suggestion: Combine if branches to simplify the code

In the import_segmentations function, the logic for fol.Segmentation and fol.Detection types when label.mask_path is not None is identical. Combining these branches can reduce code duplication.

Refactored code:
if isinstance(label, (fol.Segmentation, fol.Detection)):
    if label.mask_path is not None:
        del_path = label.mask_path if delete_images else None
        label.import_mask(update=update)
        if del_path:
            etau.delete_file(del_path)
elif isinstance(label, fol.Detections):
    for detection in label.detections:
        if detection.mask_path is not None:
            del_path = (
                detection.mask_path if delete_images else None
            )
            detection.import_mask(update=update)
            if del_path:
                etau.delete_file(del_path)
elif isinstance(label, fol.Heatmap):
    if label.map_path is not None:
        del_path = label.map_path if delete_images else None
        label.import_map(update=update)
        if del_path:
            etau.delete_file(del_path)
🧰 Tools

🪛 Ruff (0.8.0)

288-299: Combine if branches using logical or operator

Combine if branches

(SIM114)

309-310: Simplify nested if statements

You can combine the nested if statements into a single elif condition using the and operator to make the code more concise.

Refactored code:
elif isinstance(label, fol.Heatmap) and label.map_path is not None:
    del_path = label.map_path if delete_images else None
    label.import_map(update=update)
    if del_path:
        etau.delete_file(del_path)
🧰 Tools

🪛 Ruff (0.8.0)

309-310: Use a single if statement instead of nested if statements

(SIM102)
fiftyone/utils/data/exporters.py (2)
2047-2054: Refactor: Use a ternary operator to simplify _value assignment

You can replace the if-else block with a ternary operator to make the code more concise and improve readability.

Refactored code:
for _d in value:
    _value = _d.get(key, None) if key is not None else _d
    if _value is None:
        continue
    outpath, _ = media_exporter.export(_value)
    if not self.abs_paths:
        outpath = fou.safe_relpath(
            outpath, self.export_dir, default=outpath
        )
    if key is not None:
        _d[key] = outpath
    else:
        pydash.set_(d, field_name, outpath)
🧰 Tools

🪛 Ruff (0.8.0)

2047-2050: Use ternary operator _value = _d.get(key, None) if key is not None else _d instead of if-else-block

Replace if-else-block with _value = _d.get(key, None) if key is not None else _d

(SIM108)

2355-2358: Refactor: Simplify _value assignment with a ternary operator

Similarly, in this part of the code, you can use a ternary operator to simplify the assignment of _value.

Refactored code:
for _d in value:
    _value = _d.get(key, None) if key is not None else _d
    if _value is None:
        continue
    if self.export_media is not False:
        _, uuid = media_exporter.export(_value)
        outpath = os.path.join("fields", field_name, uuid)
    elif self.rel_dir is not None:
        outpath = fou.safe_relpath(
            _value, self.rel_dir, default=_value
        )
    else:
        continue
    if key is not None:
        _d[key] = outpath
    else:
        pydash.set_(d, field_name, outpath)
🧰 Tools

🪛 Ruff (0.8.0)

2355-2358: Use ternary operator _value = _d.get(key, None) if key is not None else _d instead of if-else-block

Replace if-else-block with _value = _d.get(key, None) if key is not None else _d

(SIM108)
fiftyone/utils/data/importers.py (1)
2168-2170: Consider combining nested if statements.

The nested if statements can be combined for better readability.
-        if not os.path.isabs(value):
-            pydash.set_(sd, field_name, os.path.join(rel_dir, value))
+        if etau.is_str(value) and not os.path.isabs(value):
+            pydash.set_(sd, field_name, os.path.join(rel_dir, value))
🧰 Tools

🪛 Ruff (0.8.0)

2168-2169: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)
docs/source/user_guide/using_datasets.rst (1)
Line range hint 2545-2613: Clarify mask value interpretation for disk-stored masks

The documentation should be more explicit about how mask values are interpreted for disk-stored masks. While it states that "0 indicates background", it would be helpful to clarify whether any non-zero value indicates the object or if specific positive values have special meaning.
-where a pixel value of 0 indicates the
-background (rendered as transparent in the App), and any other 
-value indicates the object.
+where a pixel value of 0 indicates the
+background (rendered as transparent in the App), and any positive
+value (1-255) indicates the object. For 16-bit masks, values 
+1-65535 indicate the object.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 81db334 and 1594990.

📒 Files selected for processing (7)

docs/source/user_guide/using_datasets.rst (3 hunks)
fiftyone/core/collections.py (1 hunks)
fiftyone/core/labels.py (2 hunks)
fiftyone/utils/data/exporters.py (3 hunks)
fiftyone/utils/data/importers.py (2 hunks)
fiftyone/utils/labels.py (10 hunks)
tests/unittests/import_export_tests.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.0)

fiftyone/utils/data/importers.py

2168-2169: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

fiftyone/utils/data/exporters.py

2047-2050: Use ternary operator _value = _d.get(key, None) if key is not None else _d instead of if-else-block

Replace if-else-block with _value = _d.get(key, None) if key is not None else _d

(SIM108)

2355-2358: Use ternary operator _value = _d.get(key, None) if key is not None else _d instead of if-else-block

Replace if-else-block with _value = _d.get(key, None) if key is not None else _d

(SIM108)

fiftyone/utils/labels.py

214-225: Combine if branches using logical or operator

Combine if branches

(SIM114)

233-234: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

288-299: Combine if branches using logical or operator

Combine if branches

(SIM114)

309-310: Use a single if statement instead of nested if statements

(SIM102)

🔇 Additional comments (9)

fiftyone/core/collections.py (1)

10684-10686: Improve list field handling by extracting field type

The code now properly handles nested list fields by recursively extracting the underlying field type. This is important for correctly identifying the type of elements in nested list fields.

while isinstance(field, fof.ListField):
    field = field.field

fiftyone/core/labels.py (2)

412-413: LGTM! Clear and accurate docstring update.

The docstring now provides precise guidance on the expected format of instance segmentation masks.

536-537: LGTM! More accurate error message.

The error message now correctly indicates that either mask or mask_path must be populated.

fiftyone/utils/data/importers.py (2)

17-17: LGTM! Added pydash for robust nested data handling.

Using pydash provides safer access to nested data structures.

2173-2191: LGTM! Well-structured helper function for nested media fields.

The new _parse_nested_media_field function cleanly encapsulates the logic for handling nested media fields.

tests/unittests/import_export_tests.py (1)

2221-2329: LGTM! Comprehensive test coverage for instance segmentation.

The new test methods thoroughly validate both in-database and on-disk storage of instance segmentation masks, including proper verification of mask paths and data integrity.

docs/source/user_guide/using_datasets.rst (3)

2620-2621: LGTM!

The documentation clearly introduces the custom attributes capability for instance segmentations.

Line range hint 2623-2661: LGTM!

The code example effectively demonstrates how to add and use custom attributes on Detection objects, with clear initialization and output examples.

Line range hint 2663-2665: LGTM!

The note provides valuable UX information about viewing custom attributes in the App with appropriate cross-referencing.

sashankaryal

LGTM! Thanks for the follow-up work - I missed a lot. 🥲

sashankaryal · 2024-12-11T18:33:36Z

fiftyone/core/labels.py

+            on disk, which should be a single-channel PNG image where any
+            non-zero values represent the instance's extent


the app technically doesn't mind multi-channel, too, but it makes sense that we write an imperative statement about masks being single-channel for clarity.

if the app runs into multi-channel pngs for masks, it uses just the first channel

Ah I see. I mostly copied this verbatim from the user guide documentation you had added:

We recommend storing masks as single-channel PNG images, where a pixel value of 0 indicates the background (rendered as transparent in the App), and any other value indicates the object.

But I was not certain whether single-channel was indeed a recommendation or a hard requirement, so the version here came out sounding more imperative just to be safe.

Feel free to clarify both instances of this documentation if you want!

benjaminpkane

LGTM

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)

fiftyone/utils/data/exporters.py (1)
2034-2065: Consider refactoring media field export logic

The media field export logic is duplicated between LegacyFiftyOneDatasetExporter and FiftyOneDatasetExporter. Consider extracting this into a shared utility function to improve maintainability.

The value assignment logic can be simplified using ternary operators:
-            if key is not None:
-                _value = _d.get(key, None)
-            else:
-                _value = _d
+            _value = _d.get(key, None) if key is not None else _d
Also applies to: 2342-2378

🧰 Tools

🪛 Ruff (0.8.2)

2047-2050: Use ternary operator _value = _d.get(key, None) if key is not None else _d instead of if-else-block

Replace if-else-block with _value = _d.get(key, None) if key is not None else _d

(SIM108)
fiftyone/core/collections.py (1)
10701-10710: Consider using ternary operator for more concise code

The blacklist initialization logic can be simplified using a ternary operator.
-        if etau.is_container(blacklist):
-            blacklist = set(blacklist)
-        else:
-            blacklist = {blacklist}
+        blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}
🧰 Tools

🪛 Ruff (0.8.2)

10702-10705: Use ternary operator blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist} instead of if-else-block

Replace if-else-block with blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}

(SIM108)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1594990 and 68eb682.

📒 Files selected for processing (2)

fiftyone/core/collections.py (4 hunks)
fiftyone/utils/data/exporters.py (5 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

fiftyone/utils/data/exporters.py

2047-2050: Use ternary operator _value = _d.get(key, None) if key is not None else _d instead of if-else-block

Replace if-else-block with _value = _d.get(key, None) if key is not None else _d

(SIM108)

2355-2358: Use ternary operator _value = _d.get(key, None) if key is not None else _d instead of if-else-block

Replace if-else-block with _value = _d.get(key, None) if key is not None else _d

(SIM108)

fiftyone/core/collections.py

10702-10705: Use ternary operator blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist} instead of if-else-block

Replace if-else-block with blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}

(SIM108)

🔇 Additional comments (2)

fiftyone/utils/data/exporters.py (1)

1897-1897: LGTM: Consistent exclusion of filepath from media fields

The addition of the "filepath" blacklist parameter ensures consistent handling of media fields by explicitly excluding the filepath field, which is handled separately.

Also applies to: 2205-2205

fiftyone/core/collections.py (1)

10675-10676: LGTM! Good defensive programming

Explicitly adding 'filepath' to app_media_fields even though it should already be there is a good defensive programming practice.

The base branch was changed.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)

fiftyone/core/collections.py (1)
10701-10710: Simplify blacklist parameter handling

The blacklist parameter handling code can be simplified using a ternary operator.
-            if etau.is_container(blacklist):
-                blacklist = set(blacklist)
-            else:
-                blacklist = {blacklist}
+            blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}
🧰 Tools

🪛 Ruff (0.8.2)

10702-10705: Use ternary operator blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist} instead of if-else-block

Replace if-else-block with blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}

(SIM108)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68eb682 and 8331661.

📒 Files selected for processing (1)

fiftyone/core/collections.py (4 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

fiftyone/core/collections.py

10702-10705: Use ternary operator blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist} instead of if-else-block

Replace if-else-block with blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}

(SIM108)

🔇 Additional comments (2)

fiftyone/core/collections.py (2)

10675-10676: LGTM: Ensuring filepath field inclusion

Good practice to ensure the 'filepath' field is always included in the media fields set.

Line range hint 10715-10739: LGTM: Improved media field parsing

The method has been renamed from _resolve_media_field to _parse_media_field which better reflects its functionality. The updated logic properly handles list fields and includes good error handling.

coderabbitai · 2024-12-12T22:12:00Z

fiftyone/core/collections.py

+    def _get_media_fields(self, whitelist=None, blacklist=None, frames=False):
        media_fields = {}


⚠️ Potential issue

Breaking change: Method signature updated

The _get_media_fields method signature has changed from include_filepath to whitelist/blacklist parameters. This is a breaking change that may affect existing code that calls this method.

Consider:

Adding a deprecation warning for any code using the old signature

Updating the documentation to highlight this breaking change

Providing migration guidance for users

sashankaryal

let's go! 🚀

support on-disk instance segmentations in SDK

1594990

brimoor requested review from benjaminpkane and sashankaryal December 11, 2024 06:34

coderabbitai bot reviewed Dec 11, 2024

View reviewed changes

sashankaryal previously approved these changes Dec 11, 2024

View reviewed changes

benjaminpkane previously approved these changes Dec 11, 2024

View reviewed changes

handle nested roots

68eb682

coderabbitai bot reviewed Dec 12, 2024

View reviewed changes

brimoor changed the base branch from develop to release/v1.2.0 December 12, 2024 21:01

handle list fields

8331661

coderabbitai bot reviewed Dec 12, 2024

View reviewed changes

sashankaryal self-requested a review December 12, 2024 22:53

sashankaryal approved these changes Dec 12, 2024

View reviewed changes

brimoor merged commit 64cf79b into release/v1.2.0 Dec 13, 2024
14 checks passed

brimoor deleted the on-disk-instances-updates branch December 13, 2024 03:25

coderabbitai bot mentioned this pull request Dec 13, 2024

Merge release/v1.2.0 to develop #5265

Merged

coderabbitai bot mentioned this pull request Dec 21, 2024

Add support for selecting/excluding group slices #5198

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support on-disk instance segmentations in SDK #5256

Support on-disk instance segmentations in SDK #5256

brimoor commented Dec 11, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 11, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

sashankaryal left a comment

sashankaryal Dec 11, 2024

brimoor Dec 12, 2024

brimoor Dec 12, 2024

benjaminpkane left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Dec 12, 2024

sashankaryal left a comment

		on disk, which should be a single-channel PNG image where any
		non-zero values represent the instance's extent

		def _get_media_fields(self, whitelist=None, blacklist=None, frames=False):
		media_fields = {}

Support on-disk instance segmentations in SDK #5256

Support on-disk instance segmentations in SDK #5256

Conversation

brimoor commented Dec 11, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Dec 11, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

sashankaryal left a comment

Choose a reason for hiding this comment

sashankaryal Dec 11, 2024

Choose a reason for hiding this comment

brimoor Dec 12, 2024

Choose a reason for hiding this comment

brimoor Dec 12, 2024

Choose a reason for hiding this comment

benjaminpkane left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 12, 2024

Choose a reason for hiding this comment

sashankaryal left a comment

Choose a reason for hiding this comment

brimoor commented Dec 11, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 11, 2024 •

edited

Loading