-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support on-disk instance segmentations in SDK #5256
Conversation
WalkthroughThe pull request introduces significant updates to the FiftyOne documentation and core functionalities. Key changes include enhancements to the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (7)
fiftyone/utils/labels.py (3)
214-225
: Refactor suggestion: Combineif
branches to reduce redundancyThe
if
statements forfol.Segmentation
andfol.Detection
types share the same logic whenlabel.mask
is notNone
. You can combine these branches using a tuple in theisinstance
check to simplify the code.Refactored code:
if isinstance(label, (fol.Segmentation, fol.Detection)): if label.mask is not None: outpath = filename_maker.get_output_path( image.filepath, output_ext=".png" ) label.export_mask(outpath, update=update) elif isinstance(label, fol.Detections): for detection in label.detections: if detection.mask is not None: outpath = filename_maker.get_output_path( image.filepath, output_ext=".png" ) detection.export_mask(outpath, update=update) elif isinstance(label, fol.Heatmap): if label.map is not None: outpath = filename_maker.get_output_path( image.filepath, output_ext=".png" ) label.export_map(outpath, update=update)🧰 Tools
🪛 Ruff (0.8.0)
214-225: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
288-299
: Refactor suggestion: Combineif
branches to simplify the codeIn the
import_segmentations
function, the logic forfol.Segmentation
andfol.Detection
types whenlabel.mask_path
is notNone
is identical. Combining these branches can reduce code duplication.Refactored code:
if isinstance(label, (fol.Segmentation, fol.Detection)): if label.mask_path is not None: del_path = label.mask_path if delete_images else None label.import_mask(update=update) if del_path: etau.delete_file(del_path) elif isinstance(label, fol.Detections): for detection in label.detections: if detection.mask_path is not None: del_path = ( detection.mask_path if delete_images else None ) detection.import_mask(update=update) if del_path: etau.delete_file(del_path) elif isinstance(label, fol.Heatmap): if label.map_path is not None: del_path = label.map_path if delete_images else None label.import_map(update=update) if del_path: etau.delete_file(del_path)🧰 Tools
🪛 Ruff (0.8.0)
288-299: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
309-310
: Simplify nestedif
statementsYou can combine the nested
if
statements into a singleelif
condition using theand
operator to make the code more concise.Refactored code:
elif isinstance(label, fol.Heatmap) and label.map_path is not None: del_path = label.map_path if delete_images else None label.import_map(update=update) if del_path: etau.delete_file(del_path)🧰 Tools
🪛 Ruff (0.8.0)
309-310: Use a single
if
statement instead of nestedif
statements(SIM102)
fiftyone/utils/data/exporters.py (2)
2047-2054
: Refactor: Use a ternary operator to simplify_value
assignmentYou can replace the
if
-else
block with a ternary operator to make the code more concise and improve readability.Refactored code:
for _d in value: _value = _d.get(key, None) if key is not None else _d if _value is None: continue outpath, _ = media_exporter.export(_value) if not self.abs_paths: outpath = fou.safe_relpath( outpath, self.export_dir, default=outpath ) if key is not None: _d[key] = outpath else: pydash.set_(d, field_name, outpath)🧰 Tools
🪛 Ruff (0.8.0)
2047-2050: Use ternary operator
_value = _d.get(key, None) if key is not None else _d
instead ofif
-else
-blockReplace
if
-else
-block with_value = _d.get(key, None) if key is not None else _d
(SIM108)
2355-2358
: Refactor: Simplify_value
assignment with a ternary operatorSimilarly, in this part of the code, you can use a ternary operator to simplify the assignment of
_value
.Refactored code:
for _d in value: _value = _d.get(key, None) if key is not None else _d if _value is None: continue if self.export_media is not False: _, uuid = media_exporter.export(_value) outpath = os.path.join("fields", field_name, uuid) elif self.rel_dir is not None: outpath = fou.safe_relpath( _value, self.rel_dir, default=_value ) else: continue if key is not None: _d[key] = outpath else: pydash.set_(d, field_name, outpath)🧰 Tools
🪛 Ruff (0.8.0)
2355-2358: Use ternary operator
_value = _d.get(key, None) if key is not None else _d
instead ofif
-else
-blockReplace
if
-else
-block with_value = _d.get(key, None) if key is not None else _d
(SIM108)
fiftyone/utils/data/importers.py (1)
2168-2170
: Consider combining nested if statements.The nested if statements can be combined for better readability.
- if not os.path.isabs(value): - pydash.set_(sd, field_name, os.path.join(rel_dir, value)) + if etau.is_str(value) and not os.path.isabs(value): + pydash.set_(sd, field_name, os.path.join(rel_dir, value))🧰 Tools
🪛 Ruff (0.8.0)
2168-2169: Use a single
if
statement instead of nestedif
statementsCombine
if
statements usingand
(SIM102)
docs/source/user_guide/using_datasets.rst (1)
Line range hint
2545-2613
: Clarify mask value interpretation for disk-stored masksThe documentation should be more explicit about how mask values are interpreted for disk-stored masks. While it states that "0 indicates background", it would be helpful to clarify whether any non-zero value indicates the object or if specific positive values have special meaning.
-where a pixel value of 0 indicates the -background (rendered as transparent in the App), and any other -value indicates the object. +where a pixel value of 0 indicates the +background (rendered as transparent in the App), and any positive +value (1-255) indicates the object. For 16-bit masks, values +1-65535 indicate the object.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (7)
docs/source/user_guide/using_datasets.rst
(3 hunks)fiftyone/core/collections.py
(1 hunks)fiftyone/core/labels.py
(2 hunks)fiftyone/utils/data/exporters.py
(3 hunks)fiftyone/utils/data/importers.py
(2 hunks)fiftyone/utils/labels.py
(10 hunks)tests/unittests/import_export_tests.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
fiftyone/utils/data/importers.py
2168-2169: Use a single if
statement instead of nested if
statements
Combine if
statements using and
(SIM102)
fiftyone/utils/data/exporters.py
2047-2050: Use ternary operator _value = _d.get(key, None) if key is not None else _d
instead of if
-else
-block
Replace if
-else
-block with _value = _d.get(key, None) if key is not None else _d
(SIM108)
2355-2358: Use ternary operator _value = _d.get(key, None) if key is not None else _d
instead of if
-else
-block
Replace if
-else
-block with _value = _d.get(key, None) if key is not None else _d
(SIM108)
fiftyone/utils/labels.py
214-225: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
233-234: Use a single if
statement instead of nested if
statements
Combine if
statements using and
(SIM102)
288-299: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
309-310: Use a single if
statement instead of nested if
statements
(SIM102)
🔇 Additional comments (9)
fiftyone/core/collections.py (1)
10684-10686
: Improve list field handling by extracting field type
The code now properly handles nested list fields by recursively extracting the underlying field type. This is important for correctly identifying the type of elements in nested list fields.
while isinstance(field, fof.ListField):
field = field.field
fiftyone/core/labels.py (2)
412-413
: LGTM! Clear and accurate docstring update.
The docstring now provides precise guidance on the expected format of instance segmentation masks.
536-537
: LGTM! More accurate error message.
The error message now correctly indicates that either mask
or mask_path
must be populated.
fiftyone/utils/data/importers.py (2)
17-17
: LGTM! Added pydash for robust nested data handling.
Using pydash provides safer access to nested data structures.
2173-2191
: LGTM! Well-structured helper function for nested media fields.
The new _parse_nested_media_field
function cleanly encapsulates the logic for handling nested media fields.
tests/unittests/import_export_tests.py (1)
2221-2329
: LGTM! Comprehensive test coverage for instance segmentation.
The new test methods thoroughly validate both in-database and on-disk storage of instance segmentation masks, including proper verification of mask paths and data integrity.
docs/source/user_guide/using_datasets.rst (3)
2620-2621
: LGTM!
The documentation clearly introduces the custom attributes capability for instance segmentations.
Line range hint 2623-2661
: LGTM!
The code example effectively demonstrates how to add and use custom attributes on Detection objects, with clear initialization and output examples.
Line range hint 2663-2665
: LGTM!
The note provides valuable UX information about viewing custom attributes in the App with appropriate cross-referencing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the follow-up work - I missed a lot. 🥲
on disk, which should be a single-channel PNG image where any | ||
non-zero values represent the instance's extent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the app technically doesn't mind multi-channel, too, but it makes sense that we write an imperative statement about masks being single-channel for clarity.
if the app runs into multi-channel pngs for masks, it uses just the first channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. I mostly copied this verbatim from the user guide documentation you had added:
We recommend storing masks as single-channel PNG images, where a pixel value of 0 indicates the background (rendered as transparent in the App), and any other value indicates the object.
But I was not certain whether single-channel was indeed a recommendation or a hard requirement, so the version here came out sounding more imperative just to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to clarify both instances of this documentation if you want!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
fiftyone/utils/data/exporters.py (1)
2034-2065
: Consider refactoring media field export logic
The media field export logic is duplicated between
LegacyFiftyOneDatasetExporter
andFiftyOneDatasetExporter
. Consider extracting this into a shared utility function to improve maintainability.The value assignment logic can be simplified using ternary operators:
- if key is not None: - _value = _d.get(key, None) - else: - _value = _d + _value = _d.get(key, None) if key is not None else _dAlso applies to: 2342-2378
🧰 Tools
🪛 Ruff (0.8.2)
2047-2050: Use ternary operator
_value = _d.get(key, None) if key is not None else _d
instead ofif
-else
-blockReplace
if
-else
-block with_value = _d.get(key, None) if key is not None else _d
(SIM108)
fiftyone/core/collections.py (1)
10701-10710
: Consider using ternary operator for more concise codeThe blacklist initialization logic can be simplified using a ternary operator.
- if etau.is_container(blacklist): - blacklist = set(blacklist) - else: - blacklist = {blacklist} + blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}🧰 Tools
🪛 Ruff (0.8.2)
10702-10705: Use ternary operator
blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}
instead ofif
-else
-blockReplace
if
-else
-block withblacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}
(SIM108)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fiftyone/core/collections.py
(4 hunks)fiftyone/utils/data/exporters.py
(5 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/utils/data/exporters.py
2047-2050: Use ternary operator _value = _d.get(key, None) if key is not None else _d
instead of if
-else
-block
Replace if
-else
-block with _value = _d.get(key, None) if key is not None else _d
(SIM108)
2355-2358: Use ternary operator _value = _d.get(key, None) if key is not None else _d
instead of if
-else
-block
Replace if
-else
-block with _value = _d.get(key, None) if key is not None else _d
(SIM108)
fiftyone/core/collections.py
10702-10705: Use ternary operator blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}
instead of if
-else
-block
Replace if
-else
-block with blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}
(SIM108)
🔇 Additional comments (2)
fiftyone/utils/data/exporters.py (1)
1897-1897
: LGTM: Consistent exclusion of filepath from media fields
The addition of the "filepath" blacklist parameter ensures consistent handling of media fields by explicitly excluding the filepath field, which is handled separately.
Also applies to: 2205-2205
fiftyone/core/collections.py (1)
10675-10676
: LGTM! Good defensive programming
Explicitly adding 'filepath' to app_media_fields even though it should already be there is a good defensive programming practice.
The base branch was changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
fiftyone/core/collections.py (1)
10701-10710
: Simplify blacklist parameter handlingThe blacklist parameter handling code can be simplified using a ternary operator.
- if etau.is_container(blacklist): - blacklist = set(blacklist) - else: - blacklist = {blacklist} + blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}🧰 Tools
🪛 Ruff (0.8.2)
10702-10705: Use ternary operator
blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}
instead ofif
-else
-blockReplace
if
-else
-block withblacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}
(SIM108)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/core/collections.py
(4 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/core/collections.py
10702-10705: Use ternary operator blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}
instead of if
-else
-block
Replace if
-else
-block with blacklist = set(blacklist) if etau.is_container(blacklist) else {blacklist}
(SIM108)
🔇 Additional comments (2)
fiftyone/core/collections.py (2)
10675-10676
: LGTM: Ensuring filepath field inclusion
Good practice to ensure the 'filepath' field is always included in the media fields set.
Line range hint 10715-10739
: LGTM: Improved media field parsing
The method has been renamed from _resolve_media_field
to _parse_media_field
which better reflects its functionality. The updated logic properly handles list fields and includes good error handling.
def _get_media_fields(self, whitelist=None, blacklist=None, frames=False): | ||
media_fields = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking change: Method signature updated
The _get_media_fields
method signature has changed from include_filepath
to whitelist/blacklist
parameters. This is a breaking change that may affect existing code that calls this method.
Consider:
- Adding a deprecation warning for any code using the old signature
- Updating the documentation to highlight this breaking change
- Providing migration guidance for users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's go! 🚀
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests