Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding SAM2 to the Model Zoo! #7

Closed

Conversation

korbit-ai[bot]
Copy link

@korbit-ai korbit-ai bot commented Aug 21, 2024

What changes are proposed in this pull request?

Adding Segment Anything 2 to the Fiftyone model zoo.

How is this patch tested? If it is not, please explain why.

Tested manually with different configurations -

  1. Images - prompted with bounding boxes, prompted with keypoints and no prompts
  2. Videos - prompted with bounding boxes and prompted with keypoints

Release Notes

Is this a user-facing change that should be mentioned in the release notes?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release
    notes for FiftyOne users.

Added SAM2 into the Fiftyone model zoo with inference support for both images and videos.

What areas of FiftyOne does this PR affect?

  • App: FiftyOne application changes
  • Build: Build and test infrastructure changes
  • Core: Core fiftyone Python library changes
  • Documentation: FiftyOne documentation changes
  • Other

Box prompt for Images

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "quickstart", max_samples=25, shuffle=True, seed=51
)

model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")

# Prompt with boxes
dataset.apply_model(
    model,
    label_field="segmentations",
    prompt_field="ground_truth",
)

session = fo.launch_app(dataset)

Keypoint prompt for Images

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("quickstart")
dataset = dataset.filter_labels("ground_truth", F("label") == "person")


# Generate some keypoints
model = foz.load_zoo_model("keypoint-rcnn-resnet50-fpn-coco-torch")
dataset.default_skeleton = model.skeleton
dataset.apply_model(model, label_field="gt")

model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")

# Prompt with keypoints
dataset.apply_model(
    model,
    label_field="segmentations",
    prompt_field="gt_keypoints",
)

session = fo.launch_app(dataset)

Automatic segmentation for Images

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "quickstart", max_samples=5, shuffle=True, seed=51
)

model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")

# Automatic segmentation
dataset.apply_model(model, label_field="auto")

session = fo.launch_app(dataset)

Prompting for Videos

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart-video", max_samples=2)

# Only retain detections on the first frame of each video
for sample in dataset:
    for frame_idx in sample.frames:
        frame = sample.frames[frame_idx]
        if frame_idx >= 2:
            frame.detections = None
        sample.save()

model = foz.load_zoo_model("segment-anything-2-hiera-tiny-video-torch")

# Prompt with boxes
dataset.apply_model(
    model,
    label_field="segmentations",
    prompt_field="frames.detections", # You can also pass in a keypoint field here
)

session = fo.launch_app(dataset)

Summary by CodeRabbit

  • New Features

    • Introduced advanced image and video segmentation capabilities using the Segment Anything 2 (SAM2) model.
    • Added tools for users to apply segmentation through various prompting methods including bounding boxes and keypoints.
  • Improvements

    • Enhanced the functionality and maintainability of segmentation methods with a simplified control flow.
    • Improved handling for extracting labels in keypoint processing, enhancing robustness.
  • Bug Fixes

    • Refined dataset handling for video and non-video data, ensuring appropriate model application based on dataset characteristics.
  • Chores

    • Updated linting configuration to accommodate OpenCV functionalities and prevent false positive warnings.

Description by Korbit AI

What change is being made?

Add support for Segment Anything Model 2 (SAM2) to the FiftyOne Model Zoo, including both image and video segmentation models.

Why are these changes being made?

These changes integrate the latest SAM2 models from Meta AI into the FiftyOne ecosystem, enabling users to leverage state-of-the-art segmentation capabilities for both images and videos. This addition enhances the model zoo's offerings and provides users with more advanced tools for their segmentation tasks.

Copy link
Author

korbit-ai bot commented Aug 21, 2024

Clone of the PR voxel51/fiftyone#4671

Copy link
Author

korbit-ai bot commented Aug 21, 2024

My review is in progress 📖 - I will have feedback for you in a few minutes!

1 similar comment
Copy link

My review is in progress 📖 - I will have feedback for you in a few minutes!

Copy link
Contributor

coderabbitai bot commented Aug 21, 2024

Walkthrough

The changes introduce a new file, fiftyone/utils/sam2.py, which includes classes and functions for integrating the Segment Anything 2 (SAM2) model into the FiftyOne framework. Two primary classes are defined for image and video segmentation, alongside several utility functions for processing inputs and managing video frames. This addition enhances the capabilities of the FiftyOne Model Zoo for advanced segmentation tasks.

Changes

Files Change Summary
fiftyone/utils/sam2.py Introduced SegmentAnything2ImageModel, SegmentAnything2VideoModel, and their respective configuration classes. Added functions for input conversion and video frame loading.

Poem

🐰 In the fields of segmentation bright,
New tools hop in with pure delight.
Images and videos, all in play,
SAM2 leads the fun today!
Quick, quick, to the models we go,
Where bounding boxes and keypoints flow! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Author

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed your code and found 1 potential issue.

Comment on lines +717 to +718
except Exception as e:
raise(e)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

category Error Handling

In the load_fiftyone_video_frames function, the code is raising a generic Exception without specifying the type of exception. Consider catching and raising specific exception types to provide more meaningful error messages and handle errors appropriately based on the exception type.

Chat with Korbit by mentioning @korbit-ai, and give a 👍 or 👎 to help Korbit improve your reviews.

Copy link

@development-korbit-ai-mentor development-korbit-ai-mentor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed your code and found 1 potential issue.

Summary of Code Review

Fix for Code Execution:

  • Replace the generic Exception in load_fiftyone_video_frames with a more specific exception like ValueError or RuntimeError.

Fix for Code Health:

  • Add a clear error message or log a descriptive error message when catching exceptions in load_fiftyone_video_frames.

Comment on lines +713 to +718
try:
images = torch.zeros(
num_frames, 3, image_size, image_size, dtype=torch.float32
)
except Exception as e:
raise(e)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

category Error Handling

In the load_fiftyone_video_frames function, a generic Exception is being raised:

try:
    images = torch.zeros(
        num_frames, 3, image_size, image_size, dtype=torch.float32
    )
except Exception as e:
    raise(e)

Instead of raising a generic Exception, consider raising a more specific exception like ValueError or RuntimeError with a clear error message. Alternatively, catch the exception, log a descriptive error message, and handle it appropriately.

Chat with Korbit by mentioning @development-korbit-ai-mentor, and give a 👍 or 👎 to help Korbit improve your reviews.

Copy link

Copy link

System configuration at time of review:

- category__name: {'enabled': True, 'user_values': {'minimum_severity': 'nit'}}
- category__database_operations: {'enabled': True, 'user_values': {'minimum_severity': 'nit'}}
- category__documentation: {'enabled': True, 'user_values': {'minimum_severity': 'nit'}}
- category__logging: {'enabled': True, 'user_values': {'minimum_severity': 'nit'}}
- category__error_handling: {'enabled': True, 'user_values': {'minimum_severity': 'nit'}}
- category__systems_and_environment: {'enabled': True, 'user_values': {'minimum_severity': 'nit'}}
- category__objects_and_data_structures: {'enabled': True, 'user_values': {'minimum_severity': 'nit'}}
- category__tests: {'enabled': False, 'user_values': {'minimum_severity': 'nit'}}
- category__readability_and_maintainability: {'enabled': True, 'user_values': {'minimum_severity': 'nit'}}
- category__asynchronous_processing: {'enabled': True, 'user_values': {'minimum_severity': 'nit'}}
- category__design_patterns: {'enabled': True, 'user_values': {'minimum_severity': 'nit'}}
- category__third_party_libraries: {'enabled': True, 'user_values': {'minimum_severity': 'nit'}}
- category__performance: {'enabled': True, 'user_values': {'minimum_severity': 'nit'}}
- category__security: {'enabled': True, 'user_values': {'minimum_severity': 'nit'}}
- pull_request_review_schedule: {'enabled': True, 'user_values': {'review_schedule': 'non-draft'}}
- maximum_shown_issues: {'enabled': True, 'user_values': {'max_issues': 10}}
- auto_pr_description: {'enabled': True, 'user_values': {}}
- category__functionality: {'enabled': True, 'user_values': {'minimum_severity': 'nit'}}

Copy link

@coderabbitai full review

Copy link
Contributor

coderabbitai bot commented Aug 22, 2024

Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Outside diff range, codebase verification and nitpick comments (1)
fiftyone/utils/sam2.py (1)

671-675: Clarify logic for handling sam2_labels.

The logic for handling sam2_labels could be more explicit.

Consider using:

labels = np.array(keypoint.sam2_labels)[valid_rows] if hasattr(keypoint, "sam2_labels") else np.ones(len(scaled_points))
Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 278d934 and 5a1e762.

Files ignored due to path filters (1)
  • fiftyone/zoo/models/manifest-torch.json is excluded by !**/*.json
Files selected for processing (1)
  • fiftyone/utils/sam2.py (1 hunks)
Additional comments not posted (6)
fiftyone/utils/sam2.py (6)

59-71: LGTM!

The SegmentAnything2VideoModelConfig class is well-implemented and straightforward.


713-718: Raise a specific exception instead of a generic one.

Raising a specific exception type can provide more meaningful error messages and improve error handling.

Consider using a more specific exception like RuntimeError or ValueError with a clear error message.


664-666: LGTM!

The _to_sam_input function is well-implemented and straightforward.


679-688: LGTM!

The _to_sam_box function is well-implemented and straightforward.


690-698: LGTM!

The _mask_to_box function is well-implemented and straightforward.


740-746: LGTM!

The _load_video_frames_monkey_patch function is well-implemented and straightforward.

Comment on lines +56 to +57
if self.points_mask_index and not 0 <= self.points_mask_index <= 2:
raise ValueError("mask_index must be 0, 1, or 2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve validation logic for points_mask_index.

The current validation logic for points_mask_index could be made more readable by using a clearer conditional expression.

Consider using:

if self.points_mask_index not in {0, 1, 2}:
    raise ValueError("mask_index must be 0, 1, or 2")

Comment on lines +211 to +214
raise ValueError(
"Unsupported prompt type %s. The supported field types are %s"
% (type(value), (fol.Detections, fol.Keypoints))
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enhance error message for unsupported prompt types.

The error message could be more descriptive to aid debugging.

Consider including the field name in the error message:

raise ValueError(
    f"Unsupported prompt type {type(value)} in field '{field_name}'. Supported types are {fol.Detections, fol.Keypoints}."
)

Comment on lines +225 to +228
raise ValueError(
"Sample %s is missing a prompt in field '%s'"
% (sample.id, field_name)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve error message for missing prompts.

The error message could be more informative by including the field name.

Consider using:

raise ValueError(
    f"Sample {sample.id} is missing a prompt in field '{field_name}'."
)

Comment on lines +468 to +470
raise ValueError(
"'prompt_field' should be a frame field for segment anything 2 video model"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify error message for invalid prompt_field.

The error message should clearly indicate the expected format for prompt_field.

Consider using:

raise ValueError(
    f"'prompt_field' must be a frame field and start with 'frames.' for the Segment Anything 2 video model."
)

Comment on lines +473 to +475
raise AttributeError(
"Missing required argument 'prompt_field' for segment anything 2 video model"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enhance error message for missing prompt_field.

The error message should be more descriptive to aid debugging.

Consider using:

raise AttributeError(
    "The 'prompt_field' argument is required for the Segment Anything 2 video model but is missing."
)

@furwellness
Copy link
Owner

/review

Copy link

PR Reviewer Guide 🔍

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Key issues to review

Error Handling
The error handling in the _forward_pass_boxes and _forward_pass_points methods could be improved. Currently, if there are no detections or keypoints, empty tensors are returned without any logging or warning.

Code Duplication
There is significant code duplication between _forward_pass_boxes and _forward_pass_points methods. Consider refactoring to reduce redundancy.

Performance Concern
The load_fiftyone_video_frames function reads frames one by one, which could be slow for large videos. Consider using batch processing or parallel reading if possible.

Copy link

/review

@korbit-ai korbit-ai bot deleted the branch cloned_develop_278d9 August 29, 2024 19:19
@korbit-ai korbit-ai bot closed this Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants