Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support on-disk instance segmentations in SDK #5256

Merged
merged 3 commits into from
Dec 13, 2024

Conversation

brimoor
Copy link
Contributor

@brimoor brimoor commented Dec 11, 2024

import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.utils.labels as foul

dataset = foz.load_zoo_dataset(
    "coco-2017",
    split="validation",
    label_types="segmentations",
    classes=["cat", "dog"],
    label_field="instances",
    max_samples=10,
    only_matching=True,
)

foul.export_segmentations(
    dataset,
    "instances",
    "/tmp/instances/masks",
)

assert dataset.count("instances.detections.mask") == 0
assert dataset.count("instances.detections.mask_path") > 0

dataset.export(
    export_dir="/tmp/instances/fod",
    dataset_type=fo.types.FiftyOneDataset,
    # export_media=False,
)

dataset2 = fo.Dataset.from_dir(
    dataset_dir="/tmp/instances/fod",
    dataset_type=fo.types.FiftyOneDataset,
)

assert dataset2.count("instances.detections.mask") == 0
assert dataset2.count("instances.detections.mask_path") > 0

foul.import_segmentations(dataset2, "instances")

assert dataset2.count("instances.detections.mask") > 0
assert dataset2.count("instances.detections.mask_path") == 0

Summary by CodeRabbit

  • New Features

    • Enhanced documentation for FiftyOne datasets, including new sections and clearer examples.
    • Improved functionality for dataset exporters and importers, supporting additional label types.
    • Added new test methods for validating instance segmentation functionality.
  • Bug Fixes

    • Clarified requirements for segmentation processes and improved error handling in related methods.
  • Documentation

    • Expanded and refined sections on instance segmentation, dataset deletion, and dynamic attributes for better user understanding.
  • Tests

    • Introduced new unit tests for instance segmentation to ensure robust functionality.

Copy link
Contributor

coderabbitai bot commented Dec 11, 2024

Walkthrough

The pull request introduces significant updates to the FiftyOne documentation and core functionalities. Key changes include enhancements to the using_datasets.rst documentation, focusing on instance segmentation, dataset deletion, and dynamic attributes. Modifications in the SampleCollection class improve field handling, while the Detection class clarifies mask usage. Additionally, the dataset exporters and importers have been refactored for better data manipulation. New tests for instance segmentation functionality have also been added, ensuring robust validation of these features.

Changes

File Change Summary
docs/source/user_guide/using_datasets.rst Added new sections and examples, refined instance segmentation details, expanded dataset deletion section, updated dynamic attributes guidance, and improved overall structure.
fiftyone/core/collections.py Enhanced field handling in SampleCollection, updated methods for media and label field processing.
fiftyone/core/labels.py Updated docstring for mask_path, modified error messages in to_segmentation method to clarify requirements.
fiftyone/utils/data/exporters.py Refactored media field export logic, improved media export path handling, and refined error handling.
fiftyone/utils/data/importers.py Integrated pydash for nested data handling, added _parse_nested_media_field function for better media field parsing.
fiftyone/utils/labels.py Enhanced segmentation and detection utilities, updated validation logic in export/import functions to support new label types.
tests/unittests/import_export_tests.py Added new tests for instance segmentation functionality, validating export and import processes for segmentation datasets.

Possibly related PRs

  • Add mask_path to fo.Detection labels #4693: This PR adds the mask_path attribute to Detection labels, which is directly related to the main PR's updates on how masks can be stored and referenced in the documentation.
  • add support for detection.mask_path #5120: This PR introduces support for detection.mask_path, enhancing the handling of masks in the Detection class, which aligns with the main PR's focus on improving documentation regarding mask management.
  • quickstart 3d dataset #4406: This PR includes updates to the using_datasets.rst file, which is the same file modified in the main PR, indicating a connection in the context of dataset usage documentation.

Suggested labels

feature, documentation

Suggested reviewers

  • swheaton
  • benjaminpkane

Poem

In the fields of data bright,
A rabbit hops with pure delight.
With masks and segments, clear and neat,
Our datasets now can’t be beat!
So let’s export and import with cheer,
For FiftyOne’s changes bring us near! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (7)
fiftyone/utils/labels.py (3)

214-225: Refactor suggestion: Combine if branches to reduce redundancy

The if statements for fol.Segmentation and fol.Detection types share the same logic when label.mask is not None. You can combine these branches using a tuple in the isinstance check to simplify the code.

Refactored code:

if isinstance(label, (fol.Segmentation, fol.Detection)):
    if label.mask is not None:
        outpath = filename_maker.get_output_path(
            image.filepath, output_ext=".png"
        )
        label.export_mask(outpath, update=update)
elif isinstance(label, fol.Detections):
    for detection in label.detections:
        if detection.mask is not None:
            outpath = filename_maker.get_output_path(
                image.filepath, output_ext=".png"
            )
            detection.export_mask(outpath, update=update)
elif isinstance(label, fol.Heatmap):
    if label.map is not None:
        outpath = filename_maker.get_output_path(
            image.filepath, output_ext=".png"
        )
        label.export_map(outpath, update=update)
🧰 Tools
🪛 Ruff (0.8.0)

214-225: Combine if branches using logical or operator

Combine if branches

(SIM114)


288-299: Refactor suggestion: Combine if branches to simplify the code

In the import_segmentations function, the logic for fol.Segmentation and fol.Detection types when label.mask_path is not None is identical. Combining these branches can reduce code duplication.

Refactored code:

if isinstance(label, (fol.Segmentation, fol.Detection)):
    if label.mask_path is not None:
        del_path = label.mask_path if delete_images else None
        label.import_mask(update=update)
        if del_path:
            etau.delete_file(del_path)
elif isinstance(label, fol.Detections):
    for detection in label.detections:
        if detection.mask_path is not None:
            del_path = (
                detection.mask_path if delete_images else None
            )
            detection.import_mask(update=update)
            if del_path:
                etau.delete_file(del_path)
elif isinstance(label, fol.Heatmap):
    if label.map_path is not None:
        del_path = label.map_path if delete_images else None
        label.import_map(update=update)
        if del_path:
            etau.delete_file(del_path)
🧰 Tools
🪛 Ruff (0.8.0)

288-299: Combine if branches using logical or operator

Combine if branches

(SIM114)


309-310: Simplify nested if statements

You can combine the nested if statements into a single elif condition using the and operator to make the code more concise.

Refactored code:

elif isinstance(label, fol.Heatmap) and label.map_path is not None:
    del_path = label.map_path if delete_images else None
    label.import_map(update=update)
    if del_path:
        etau.delete_file(del_path)
🧰 Tools
🪛 Ruff (0.8.0)

309-310: Use a single if statement instead of nested if statements

(SIM102)

fiftyone/utils/data/exporters.py (2)

2047-2054: Refactor: Use a ternary operator to simplify _value assignment

You can replace the if-else block with a ternary operator to make the code more concise and improve readability.

Refactored code:

for _d in value:
    _value = _d.get(key, None) if key is not None else _d
    if _value is None:
        continue
    outpath, _ = media_exporter.export(_value)
    if not self.abs_paths:
        outpath = fou.safe_relpath(
            outpath, self.export_dir, default=outpath
        )
    if key is not None:
        _d[key] = outpath
    else:
        pydash.set_(d, field_name, outpath)
🧰 Tools
🪛 Ruff (0.8.0)

2047-2050: Use ternary operator _value = _d.get(key, None) if key is not None else _d instead of if-else-block

Replace if-else-block with _value = _d.get(key, None) if key is not None else _d

(SIM108)


2355-2358: Refactor: Simplify _value assignment with a ternary operator

Similarly, in this part of the code, you can use a ternary operator to simplify the assignment of _value.

Refactored code:

for _d in value:
    _value = _d.get(key, None) if key is not None else _d
    if _value is None:
        continue
    if self.export_media is not False:
        _, uuid = media_exporter.export(_value)
        outpath = os.path.join("fields", field_name, uuid)
    elif self.rel_dir is not None:
        outpath = fou.safe_relpath(
            _value, self.rel_dir, default=_value
        )
    else:
        continue
    if key is not None:
        _d[key] = outpath
    else:
        pydash.set_(d, field_name, outpath)
🧰 Tools
🪛 Ruff (0.8.0)

2355-2358: Use ternary operator _value = _d.get(key, None) if key is not None else _d instead of if-else-block

Replace if-else-block with _value = _d.get(key, None) if key is not None else _d

(SIM108)

fiftyone/utils/data/importers.py (1)

2168-2170: Consider combining nested if statements.

The nested if statements can be combined for better readability.

-        if not os.path.isabs(value):
-            pydash.set_(sd, field_name, os.path.join(rel_dir, value))
+        if etau.is_str(value) and not os.path.isabs(value):
+            pydash.set_(sd, field_name, os.path.join(rel_dir, value))
🧰 Tools
🪛 Ruff (0.8.0)

2168-2169: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

docs/source/user_guide/using_datasets.rst (1)

Line range hint 2545-2613: Clarify mask value interpretation for disk-stored masks

The documentation should be more explicit about how mask values are interpreted for disk-stored masks. While it states that "0 indicates background", it would be helpful to clarify whether any non-zero value indicates the object or if specific positive values have special meaning.

-where a pixel value of 0 indicates the
-background (rendered as transparent in the App), and any other 
-value indicates the object.
+where a pixel value of 0 indicates the
+background (rendered as transparent in the App), and any positive
+value (1-255) indicates the object. For 16-bit masks, values 
+1-65535 indicate the object.
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 81db334 and 1594990.

📒 Files selected for processing (7)
  • docs/source/user_guide/using_datasets.rst (3 hunks)
  • fiftyone/core/collections.py (1 hunks)
  • fiftyone/core/labels.py (2 hunks)
  • fiftyone/utils/data/exporters.py (3 hunks)
  • fiftyone/utils/data/importers.py (2 hunks)
  • fiftyone/utils/labels.py (10 hunks)
  • tests/unittests/import_export_tests.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
fiftyone/utils/data/importers.py

2168-2169: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

fiftyone/utils/data/exporters.py

2047-2050: Use ternary operator _value = _d.get(key, None) if key is not None else _d instead of if-else-block

Replace if-else-block with _value = _d.get(key, None) if key is not None else _d

(SIM108)


2355-2358: Use ternary operator _value = _d.get(key, None) if key is not None else _d instead of if-else-block

Replace if-else-block with _value = _d.get(key, None) if key is not None else _d

(SIM108)

fiftyone/utils/labels.py

214-225: Combine if branches using logical or operator

Combine if branches

(SIM114)


233-234: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)


288-299: Combine if branches using logical or operator

Combine if branches

(SIM114)


309-310: Use a single if statement instead of nested if statements

(SIM102)

🔇 Additional comments (9)
fiftyone/core/collections.py (1)

10684-10686: Improve list field handling by extracting field type

The code now properly handles nested list fields by recursively extracting the underlying field type. This is important for correctly identifying the type of elements in nested list fields.

while isinstance(field, fof.ListField):
    field = field.field
fiftyone/core/labels.py (2)

412-413: LGTM! Clear and accurate docstring update.

The docstring now provides precise guidance on the expected format of instance segmentation masks.


536-537: LGTM! More accurate error message.

The error message now correctly indicates that either mask or mask_path must be populated.

fiftyone/utils/data/importers.py (2)

17-17: LGTM! Added pydash for robust nested data handling.

Using pydash provides safer access to nested data structures.


2173-2191: LGTM! Well-structured helper function for nested media fields.

The new _parse_nested_media_field function cleanly encapsulates the logic for handling nested media fields.

tests/unittests/import_export_tests.py (1)

2221-2329: LGTM! Comprehensive test coverage for instance segmentation.

The new test methods thoroughly validate both in-database and on-disk storage of instance segmentation masks, including proper verification of mask paths and data integrity.

docs/source/user_guide/using_datasets.rst (3)

2620-2621: LGTM!

The documentation clearly introduces the custom attributes capability for instance segmentations.


Line range hint 2623-2661: LGTM!

The code example effectively demonstrates how to add and use custom attributes on Detection objects, with clear initialization and output examples.


Line range hint 2663-2665: LGTM!

The note provides valuable UX information about viewing custom attributes in the App with appropriate cross-referencing.

sashankaryal
sashankaryal previously approved these changes Dec 11, 2024
Copy link
Contributor

@sashankaryal sashankaryal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the follow-up work - I missed a lot. 🥲

Comment on lines +412 to +413
on disk, which should be a single-channel PNG image where any
non-zero values represent the instance's extent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the app technically doesn't mind multi-channel, too, but it makes sense that we write an imperative statement about masks being single-channel for clarity.

if the app runs into multi-channel pngs for masks, it uses just the first channel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. I mostly copied this verbatim from the user guide documentation you had added:

We recommend storing masks as single-channel PNG images, where a pixel value of 0 indicates the background (rendered as transparent in the App), and any other value indicates the object.

But I was not certain whether single-channel was indeed a recommendation or a hard requirement, so the version here came out sounding more imperative just to be safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to clarify both instances of this documentation if you want!

benjaminpkane
benjaminpkane previously approved these changes Dec 11, 2024
Copy link
Contributor

@benjaminpkane benjaminpkane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
fiftyone/utils/data/exporters.py (1)

2034-2065: Consider refactoring media field export logic

  1. The media field export logic is duplicated between LegacyFiftyOneDatasetExporter and FiftyOneDatasetExporter. Consider extracting this into a shared utility function to improve maintainability.

  2. The value assignment logic can be simplified using ternary operators:

-            if key is not None:
-                _value = _d.get(key, None)
-            else:
-                _value = _d
+            _value = _d.get(key, None) if key is not None else _d

Also applies to: 2342-2378

🧰 Tools
🪛 Ruff (0.8.2)

2047-2050: Use ternary operator _value = _d.get(key, None) if key is not None else _d instead of if-else-block

Replace if-else-block with _value = _d.get(key, None) if key is not None else _d

(SIM108)

fiftyone/core/collections.py (1)

10701-10710: Consider using ternary operator for more concise code

The blacklist initialization logic can be simplified using a ternary operator.

-        if etau.is_container(blacklist):
-            blacklist = set(blacklist)
-        else:
-            blacklist = {blacklist}
+        blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}
🧰 Tools
🪛 Ruff (0.8.2)

10702-10705: Use ternary operator blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist} instead of if-else-block

Replace if-else-block with blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}

(SIM108)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1594990 and 68eb682.

📒 Files selected for processing (2)
  • fiftyone/core/collections.py (4 hunks)
  • fiftyone/utils/data/exporters.py (5 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/utils/data/exporters.py

2047-2050: Use ternary operator _value = _d.get(key, None) if key is not None else _d instead of if-else-block

Replace if-else-block with _value = _d.get(key, None) if key is not None else _d

(SIM108)


2355-2358: Use ternary operator _value = _d.get(key, None) if key is not None else _d instead of if-else-block

Replace if-else-block with _value = _d.get(key, None) if key is not None else _d

(SIM108)

fiftyone/core/collections.py

10702-10705: Use ternary operator blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist} instead of if-else-block

Replace if-else-block with blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}

(SIM108)

🔇 Additional comments (2)
fiftyone/utils/data/exporters.py (1)

1897-1897: LGTM: Consistent exclusion of filepath from media fields

The addition of the "filepath" blacklist parameter ensures consistent handling of media fields by explicitly excluding the filepath field, which is handled separately.

Also applies to: 2205-2205

fiftyone/core/collections.py (1)

10675-10676: LGTM! Good defensive programming

Explicitly adding 'filepath' to app_media_fields even though it should already be there is a good defensive programming practice.

@brimoor brimoor changed the base branch from develop to release/v1.2.0 December 12, 2024 21:01
@brimoor brimoor dismissed stale reviews from benjaminpkane and sashankaryal December 12, 2024 21:01

The base branch was changed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
fiftyone/core/collections.py (1)

10701-10710: Simplify blacklist parameter handling

The blacklist parameter handling code can be simplified using a ternary operator.

-            if etau.is_container(blacklist):
-                blacklist = set(blacklist)
-            else:
-                blacklist = {blacklist}
+            blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}
🧰 Tools
🪛 Ruff (0.8.2)

10702-10705: Use ternary operator blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist} instead of if-else-block

Replace if-else-block with blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}

(SIM108)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68eb682 and 8331661.

📒 Files selected for processing (1)
  • fiftyone/core/collections.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/core/collections.py

10702-10705: Use ternary operator blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist} instead of if-else-block

Replace if-else-block with blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}

(SIM108)

🔇 Additional comments (2)
fiftyone/core/collections.py (2)

10675-10676: LGTM: Ensuring filepath field inclusion

Good practice to ensure the 'filepath' field is always included in the media fields set.


Line range hint 10715-10739: LGTM: Improved media field parsing

The method has been renamed from _resolve_media_field to _parse_media_field which better reflects its functionality. The updated logic properly handles list fields and includes good error handling.

Comment on lines +10665 to 10666
def _get_media_fields(self, whitelist=None, blacklist=None, frames=False):
media_fields = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Breaking change: Method signature updated

The _get_media_fields method signature has changed from include_filepath to whitelist/blacklist parameters. This is a breaking change that may affect existing code that calls this method.

Consider:

  1. Adding a deprecation warning for any code using the old signature
  2. Updating the documentation to highlight this breaking change
  3. Providing migration guidance for users

@sashankaryal sashankaryal self-requested a review December 12, 2024 22:53
Copy link
Contributor

@sashankaryal sashankaryal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's go! 🚀

@brimoor brimoor merged commit 64cf79b into release/v1.2.0 Dec 13, 2024
14 checks passed
@brimoor brimoor deleted the on-disk-instances-updates branch December 13, 2024 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants