SAHI-like tool for instance segmentation and detection with support of YOLOv8, YOLOv9, FastSAM, and RTDETR #9381

Koldim2001 · 2024-03-28T13:49:01Z

Koldim2001
Mar 28, 2024

Hi there!

I'd like to share with you a project I've recently worked on. Together with a colleague, we've created a repository that serves as a tool with a SAHI-like inference but specifically tailored for instance segmentation tasks.

Our repository allows for segmenting small objects in images by combining mask predictions from various overlapping patches. We support both YOLOv8-seg and FastSAM. Additionally, we have a variant for object detection tasks, and the key distinction from SAHI is the support for all the current models from the Ultralytics team: YOLOv9, YOLOv8, RTDTR, and others.

I'm a huge fan of Ultralytics, so I'd be thrilled to assist you if you're interested in our project. I'm confident that for many people, the task of finding a large number of segments would be beneficial, especially when using standard Ultralytics models.

Here's the link to the project: YOLO-Patch-Based-Inference.

I have already been in touch with Paula Derrenger. She informed me that I might be able to contribute to the documentation for this library at docs.ultralytics.com. Here is the link to the project discussion - https://github.com/orgs/ultralytics/discussions/8734#discussioncomment-8933879

So I would be happy to participate in this if you have no objections. I have always dreamed of helping the ultralytics team! I am confident that my project will allow for convenient detection and instance segmentation of small objects in an image, as well as provide a custom visualization of the inference results of all the main networks available in the ultralytics library.
In this regard, my library could become a very useful addition to yours.

Koldim2001 · 2024-03-28T14:07:17Z

Koldim2001
Mar 28, 2024
Author

Example of information in docs.ultralytics

This library simplifies SAHI-like inference for instance segmentation tasks, enabling the detection of small objects in images. It caters to both object detection and instance segmentation tasks, supporting a wide range of Ultralytics models.

Model Support: The library provides support for various ultralytics deep learning models, including YOLOv8, YOLOv9, FastSAM, and RTDETR. Users can choose from pre-trained options or use custom-trained models to best suit their task requirements.

The library also provides a sleek customization of the visualization of the inference results for all models, both in the standard approach (direct network run) and the unique patch-based variant.

Installation

You can install the library via pip:

pip install patched_yolo_infer

- Click here to visit the PyPI page for patched-yolo-infer, where you can find more information and documentation.

Note: If CUDA support is available, it's recommended to pre-install PyTorch with CUDA support before installing the library. Otherwise, the CPU version will be installed by default.

Notebooks

Interactive notebooks are provided to showcase the functionality of the library. These notebooks cover batch-inference procedures for detection, instance segmentation, inference custom visualization, and more. Each notebook is paired with a tutorial on YouTube, making it easy to learn and implement features.

Topic	Notebook	YouTube
Patch-Based-Inference Example
Example of utilizing a function to visualize basic Ultralytics model inference results and managing overlapping image crops

Examples:

Detection example:

Instance Segmentation example 1:

Instance Segmentation example 2:

Usage

1. Patch-Based-Inference

To carry out patch-based inference of YOLO models using our library, you need to follow a sequential procedure. First, you create an instance of the MakeCropsDetectThem class, providing all desired parameters related to YOLO inference and the patch segmentation principle.
Subsequently, you pass the obtained object of this class to CombineDetections, which facilitates the consolidation of all predictions from each overlapping crop, followed by intelligent suppression of duplicates.
Upon completion, you receive the result, from which you can extract the desired outcome of frame processing.

The output obtained from the process includes several attributes that can be leveraged for further analysis or visualization:

img: This attribute contains the original image on which the inference was performed. It provides context for the detected objects.
confidences: This attribute holds the confidence scores associated with each detected object. These scores indicate the model's confidence level in the accuracy of its predictions.
boxes: These bounding boxes are represented as a list of lists, where each list contains four values: [x_min, y_min, x_max, y_max]. These values correspond to the coordinates of the top-left and bottom-right corners of each bounding box.
masks: If available, this attribute provides segmentation masks corresponding to the detected objects. These masks can be used to precisely delineate object boundaries.
classes_ids: This attribute contains the class IDs assigned to each detected object. These IDs correspond to specific object classes defined during the model training phase.
classes_names: These are the human-readable names corresponding to the class IDs. They provide semantic labels for the detected objects, making the results easier to interpret.

import cv2
from patched_yolo_infer import MakeCropsDetectThem, CombineDetections

# Load the image 
img_path = 'test_image.jpg'
img = cv2.imread(img_path)

element_crops = MakeCropsDetectThem(
    image=img,
    model_path="yolov8m.pt",
    segment=False,
    shape_x=640,
    shape_y=640,
    overlap_x=50,
    overlap_y=50,
    conf=0.5,
    iou=0.7,
    resize_initial_size=True,
)
result = CombineDetections(element_crops, nms_threshold=0.05, match_metric='IOS')  

# Final Results:
img=result.image
confidences=result.filtered_confidences
boxes=result.filtered_boxes
masks=result.filtered_masks
classes_ids=result.filtered_classes_id
classes_names=result.filtered_classes_names

Explanation of possible input arguments:

MakeCropsDetectThem
Class implementing cropping and passing crops through a neural network for detection/segmentation.
Args:

image (np.ndarray): Input image BGR.
model_path (str): Path to the YOLO model.
model (ultralytics model) Pre-initialized model object. If provided, the model will be used directly instead of loading from model_path.
imgsz (int): Size of the input image for inference YOLO.
conf (float): Confidence threshold for detections YOLO.
iou (float): IoU threshold for non-maximum suppression YOLOv8 of single crop.
classes_list (List[int] or None): List of classes to filter detections. If None, all classes are considered. Defaults to None.
segment (bool): Whether to perform segmentation (YOLOv8-seg).
shape_x (int): Size of the crop in the x-coordinate.
shape_y (int): Size of the crop in the y-coordinate.
overlap_x (float): Percentage of overlap along the x-axis.
overlap_y (float): Percentage of overlap along the y-axis.
show_crops (bool): Whether to visualize the cropping.
resize_initial_size (bool): Whether to resize the results to the original image size (ps: slow operation).

CombineDetections
Class implementing combining masks/boxes from multiple crops + NMS (Non-Maximum Suppression).
Args:

element_crops (MakeCropsDetectThem): Object containing crop information.
nms_threshold (float): IoU/IoS threshold for non-maximum suppression.
match_metric (str): Matching metric, either 'IOU' or 'IOS'.
intelligent_sorter (bool): Enable sorting by area and rounded confidence parameter.
If False, sorting will be done only by confidence (usual nms). (Dafault is False)

2. Custom inference visualization:

Visualizes custom results of object detection or segmentation on an image.

Args:

img (numpy.ndarray): The input image in BGR format.
boxes (list): A list of bounding boxes in the format [x_min, y_min, x_max, y_max].
classes_ids (list): A list of class IDs for each detection.
confidences (list): A list of confidence scores corresponding to each bounding box. Default is an empty list.
classes_names (list): A list of class names corresponding to the class IDs. Default is an empty list.
masks (list): A list of masks. Default is an empty list.
segment (bool): Whether to perform instance segmentation. Default is False.
show_boxes (bool): Whether to show bounding boxes. Default is True.
show_class (bool): Whether to show class labels. Default is True.
fill_mask (bool): Whether to fill the segmented regions with color. Default is False.
alpha (float): The transparency of filled masks. Default is 0.3.
color_class_background (tuple): The background BGR color for class labels. Default is (0, 0, 255) (red).
color_class_text (tuple): The text color for class labels. Default is (255, 255, 255) (white).
thickness (int): The thickness of bounding box and text. Default is 4.
font: The font type for class labels. Default is cv2.FONT_HERSHEY_SIMPLEX.
font_scale (float): The scale factor for font size. Default is 1.5.
delta_colors (int): The random seed offset for color variation. Default is seed=0.
dpi (int): Final visualization size (plot is bigger when dpi is higher). Default is 150.
random_object_colors (bool): If true, colors for each object are selected randomly. Default is False.
show_confidences (bool): If true and show_class=True, confidences near class are visualized. Default is False.
axis_off (bool): If true, axis is turned off in the final visualization. Default is True.
show_classes_list (list): If empty, visualize all classes. Otherwise, visualize only classes in the list.
return_image_array (bool): If True, the function returns the image (BGR np.array) instead of displaying it.
Default is False.

Example of using:

from patched_yolo_infer import visualize_results

# Assuming result is an instance of the CombineDetections class
result = CombineDetections(...) 

# Visualizing the results using the visualize_results function
visualize_results(
    img=result.image,
    confidences=result.filtered_confidences,
    boxes=result.filtered_boxes,
    masks=result.filtered_masks,
    classes_ids=result.filtered_classes_id,
    classes_names=result.filtered_classes_names,
    segment=False,
)

1 reply

pderrenger Mar 28, 2024
Maintainer

Thanks for sharing your project with the Ultralytics community! It's fantastic to see the innovative tools our community members are creating, especially ones that enhance the utility of YOLO models for specific tasks like instance segmentation and detection in images with small objects. Your patch-based inference method sounds like a valuable addition for handling such nuanced challenges.

The integration of YOLOv8, YOLOv9, FastSAM, RTDETR, and other Ultralytics models into your tool broadens its applicability and makes it an exciting resource for our users. The ability to handle overlapping patches effectively will undoubtedly aid in improving the detection and segmentation outcomes for many.

Given the potential of your project to assist a substantial segment of our community, we'd definitely encourage you to participate in the documentation process at docs.ultralytics.com. Your contribution could provide users with more tools and methods to leverage the power of Ultralytics models efficiently. 🚀

Please feel free to start by creating a pull request with your proposed additions or reach out to us for guidance on how to best integrate your documentation. Sharing your insights, tutorials, and examples will greatly benefit users looking for advanced detection and segmentation techniques.

Lastly, don’t hesitate to keep us posted on any feedback or further developments on your project. Together, we can continue to build a more versatile and powerful set of tools for the AI and computer vision community. Keep up the great work!

Koldim2001 · 2024-03-28T18:13:34Z

Koldim2001
Mar 28, 2024
Author

Great, thank you. I will wait for a response from your team regarding what needs to be done to integrate the tutorial for our library into the ultralytics documentation. I have sent a preliminary version of the text to the "Discussions" section, so I assume it remains to wait for feedback.

1 reply

pderrenger Mar 29, 2024
Maintainer

Thank you for submitting your proposal to the "Discussions" section! We're glad to hear about your enthusiasm and contributions to the community. The Ultralytics team will review your submission and get back to you with feedback or further steps as soon as possible. Your patience and support are highly appreciated! 🚀

Koldim2001 · 2024-03-28T19:06:05Z

Koldim2001
Mar 28, 2024
Author

Good day @pderrenger. I think I've figured it out and managed to create a markdown file describing the library. I did it in a similar way to other similar works. I just created a pull request. If you could take a look, please, #9387. Thank you in advance for your huge help.

1 reply

pderrenger Apr 9, 2024
Maintainer

Hello there! 👋 Great job on taking the initiative and creating the markdown file for your library, and thank you for the pull request!

I'll definitely take a look at it and provide any necessary feedback or approval. Your contribution to enhancing the documentation and resources for the community is genuinely appreciated. We strive to support innovative work that leverages Ultralytics models, and your project sounds like a valuable addition.

Stay tuned for updates on your PR! Thanks again for your effort and for being part of the Ultralytics community! 🚀

AdityaPrakash0018 · 2024-04-04T11:11:34Z

AdityaPrakash0018
Apr 4, 2024

Hi @Koldim2001,
Does this code also work on videos? Can I perform detection on videos and show it in real time or save it?

1 reply

Koldim2001 Apr 4, 2024
Author

@AdityaPrakash0018 Yes, the library can work with videos as well. You can create a video frame generator using cv2.VideoCapture and feed each frame to the detection algorithm. However, the processing speed may not be high enough for real-time applications. The library was initially designed for image processing tasks where accuracy is more important than speed.

kshitizkhanal7 · 2024-05-08T18:58:34Z

kshitizkhanal7
May 8, 2024

@Koldim2001 How do I export the annotations from the YOLOv8 instance segmentation results to various formats? I would really appreciate it

2 replies

Koldim2001 May 8, 2024
Author

The algorithm outputs an array of filtered binary masks of all detected objects -> masks=result.filtered_masks . Based on this, you can visualize results (we made a ready-made tool for this) and convert it into any desired format (for this you will need to write the desired postprocessing)

pderrenger May 9, 2024
Maintainer

To export the annotations from the filtered_masks in your YOLOv8 instance segmentation results, you can use a basic Python script to save them in your desired format. If you need to convert these masks to a common format like COCO, here's a simple sample of what your postprocessing might look like in Python:

import numpy as np
import cv2
import json

def masks_to_coco(image_id, masks, categories):
    annotations = []
    for i, mask in enumerate(masks):
        # Convert mask to polygons
        contours, _ = cv2.findContours(mask.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
        for contour in contours:
            # Be sure contour area is large enough
            if cv2.contourArea(contour) > 1.0:
                segmentation = contour.flatten().tolist()
                bbox = cv2.boundingRect(contour)
                annotation = {
                    "image_id": image_id,
                    "category_id": categories[i],
                    "segmentation": [segmentation],
                    "bbox": [int(bbox[0]), int(bbox[1]), int(bbox[2]), int(bbox[3])],
                    "area": cv2.contourArea(contour),
                    "iscrowd": 0
                }
                annotations.append(annotation)
    return annotations

# Example usage
image_id = 1
masks = [result.filtered_masks]  # Example masks from your model
categories = [1] * len(masks)  # Example: all detected objects are of category 1
coco_annotations = masks_to_coco(image_id, masks, categories)

# Optionally, save to a JSON file
with open('annotations.json', 'w') as f:
    json.dump(coco_annotations, f)

This script converts binary masks to the COCO annotation format by detecting contours and creating bounding boxes and segmentation polygons from them. Adjust categories according to the actual class IDs of your detected objects.

lange4531 · 2024-05-09T16:31:36Z

lange4531
May 9, 2024

@Koldim2001 I like this initiative a lot and have added my fine-tuned YOLOv9-seg model and CVs video capture to take video as input instead of images, along with the code you have given us. What is the best way to make your code handle video instead of images?

3 replies

Koldim2001 May 10, 2024
Author

If the problem is that the visualize_results function produces each frame in the form of a plt graph, then this can be easily changed. You need to set the parameter in the function return_image_array=True. then, instead of displaying it in plt, the function will output the processed frame as a numpy array and then you can visualize it as an option via cv2.imshow()

lange4531 May 10, 2024

return_image_array=True made all the difference, video works now gonna experiment around with trying to improve the FPS :D

Thank you so much for this!

pderrenger May 10, 2024
Maintainer

Your suggestion to set return_image_array=True to switch from a plt graph to a numpy array output is indeed right on point! 👍 You can then easily use cv2.imshow() to display the frames in real-time inside a video loop. Here's a quick example:

import cv2

while True:
    # Assuming 'get_frame()' fetches video frames and 'process_frame()' processes them
    frame = get_frame()
    processed_frame = process_frame(frame, return_image_array=True)

    cv2.imshow('Processed Frame', processed_frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cv2.destroyAllWindows()

This way, you integrate direct video processing, leveraging OpenCV's imshow for visualization. If there's anything more specific you need help with, just let me know!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ultralytics

SAHI-like tool for instance segmentation and detection with support of YOLOv8, YOLOv9, FastSAM, and RTDETR #9381

{{title}}

Replies: 6 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Ultralytics

SAHI-like tool for instance segmentation and detection with support of YOLOv8, YOLOv9, FastSAM, and RTDETR #9381

Koldim2001 Mar 28, 2024

Replies: 6 comments · 9 replies

Koldim2001 Mar 28, 2024 Author

Example of information in docs.ultralytics

Installation

Notebooks

Examples:

Detection example:

Instance Segmentation example 1:

Instance Segmentation example 2:

Usage

1. Patch-Based-Inference

Explanation of possible input arguments:

2. Custom inference visualization:

pderrenger Mar 28, 2024 Maintainer

Koldim2001 Mar 28, 2024 Author

pderrenger Mar 29, 2024 Maintainer

Koldim2001 Mar 28, 2024 Author

pderrenger Apr 9, 2024 Maintainer

AdityaPrakash0018 Apr 4, 2024

Koldim2001 Apr 4, 2024 Author

kshitizkhanal7 May 8, 2024

Koldim2001 May 8, 2024 Author

pderrenger May 9, 2024 Maintainer

lange4531 May 9, 2024

Koldim2001 May 10, 2024 Author

lange4531 May 10, 2024

pderrenger May 10, 2024 Maintainer

Koldim2001
Mar 28, 2024

Replies: 6 comments 9 replies

Koldim2001
Mar 28, 2024
Author

pderrenger Mar 28, 2024
Maintainer

Koldim2001
Mar 28, 2024
Author

pderrenger Mar 29, 2024
Maintainer

Koldim2001
Mar 28, 2024
Author

pderrenger Apr 9, 2024
Maintainer

AdityaPrakash0018
Apr 4, 2024

Koldim2001 Apr 4, 2024
Author

kshitizkhanal7
May 8, 2024

Koldim2001 May 8, 2024
Author

pderrenger May 9, 2024
Maintainer

lange4531
May 9, 2024

Koldim2001 May 10, 2024
Author

pderrenger May 10, 2024
Maintainer