commit SAM2 and ObjectDetectors nodes

chflame163 · Sep 2, 2024 · 0defae2 · 0defae2
1 parent c4368ba
commit 0defae2
Show file tree

Hide file tree

Showing 47 changed files with 8,530 additions and 22 deletions.
diff --git a/README.MD b/README.MD
@@ -100,10 +100,13 @@ Solution:
 ### Requests.exceptions.ProxyError: HTTPSConnectionPool(xxxx...)
 When this error has occurred, please check the network environment.
 
-
 ## Update
 <font size="4">**If the dependency package error after updating, please double clicking ```repair_dependency.bat``` (for Official ComfyUI Protable) or ```repair_dependency_aki.bat``` (for ComfyUI-aki-v1.x) in the plugin folder to reinstall the dependency packages. </font><br /> 
 
+* Commit [SAM2Ultra](#SAM2Ultra), [SAM2VideoUltra](#SAM2VideoUltra), [ObjectDetectorFL2](#ObjectDetectorFL2), [ObjectDetectorYOLOWorld](#ObjectDetectorYOLOWorld), [ObjectDetectorYOLO8](#ObjectDetectorYOLO8), [ObjectDetectorMask](#ObjectDetectorMask) and [BBoxJoin](#BBoxJoin) nodes. 
+Download models from [BaiduNetdisk](https://pan.baidu.com/s/1xaQYBA6ktxvAxm310HXweQ?pwd=auki) or [huggingface.co/Kijai/sam2-safetensors](https://huggingface.co/Kijai/sam2-safetensors/tree/main) and copy to ```ComfyUI/models/sam2``` folder,
+Download models from [BaiduNetdisk](https://pan.baidu.com/s/1QpjajeTA37vEAU2OQnbDcQ?pwd=nqsk) or [GoogleDrive](https://drive.google.com/drive/folders/1nrsfq4S-yk9ewJgwrhXAoNVqIFLZ1at7?usp=sharing) and copy to ```ComfyUI/models/yolo-world``` folder.
+This update introduces new dependencies, please reinstall the dependency package.
 * Commit [RandomGenerator](#RandomGenerator) node, Used to generate random numbers within a specified range, with outputs of int, float, and boolean, supporting batch generation of different random numbers by image batch.
 * Commit [EVF-SAMUltra](#EVFSAMUltra) node, it is implementation of [EVF-SAM](https://github.com/hustvl/EVF-SAM) in ComfyUI. Please download model files from [BaiduNetdisk](https://pan.baidu.com/s/1EvaxgKcCxUpMbYKzLnEx9w?pwd=69bn) or [huggingface/EVF-SAM2](https://huggingface.co/YxZhang/evf-sam2/tree/main), [huggingface/EVF-SAM](https://huggingface.co/YxZhang/evf-sam/tree/main) to ```ComfyUI/models/EVF-SAM``` folder(save the models in their respective subdirectories).
 Due to the introduction of new dependencies package, after the plugin upgrade, please reinstall the dependency packages.
@@ -1588,6 +1591,115 @@ On the basis of SegmentAnythingUltra, the following changes have been made:
 * device: Set whether the VitMatte to use cuda.
 * max_megapixels: Set the maximum size for VitMate operations.
 
+
+### <a id="table1">SAM2Ultra</a>
+This node is modified from [kijai/ComfyUI-segment-anything-2](https://github.com/kijai/ComfyUI-segment-anything-2). Thank to [kijai](https://github.com/kijai) for making significant contributions to the Comfyui community. 
+SAM2 Ultra node only support single image. If you need to process multiple images, please first convert the image batch to image list. 
+*Download models from [BaiduNetdisk](https://pan.baidu.com/s/1xaQYBA6ktxvAxm310HXweQ?pwd=auki) or [huggingface.co/Kijai/sam2-safetensors](https://huggingface.co/Kijai/sam2-safetensors/tree/main) and copy to ```ComfyUI/models/sam2``` folder.
+
+![image](image/sam2_example.jpg) 
+
+Node Options: 
+![image](image/sam2_ultra_node.jpg) 
+
+* image: The image to segment.
+* bboxes: Input recognition box data.
+* sam2_model: Select the SAM2 model.
+* presicion: Model's persicion. can be selected from fp16, bf16, and fp32.
+* bbox_select: Select the input box data. There are three options: "all" to select all, "first" to select the box with the highest confidence, and "by_index" to specify the index of the box.
+* select_index: This option is valid when bbox_delect is 'by_index'. 0 is the first one. Multiple values can be entered, separated by any non numeric character, including but not limited to commas, periods, semicolons, spaces or letters, and even Chinese.
+* cache_model: Whether to cache the model. After caching the model, it will save time for model loading.
+* detail_method: Edge processing methods. provides VITMatte, VITMatte(local), PyMatting, GuidedFilter. If the model has been downloaded after the first use of VITMatte, you can use VITMatte (local) afterwards.
+* detail_erode: Mask the erosion range inward from the edge. the larger the value, the larger the range of inward repair.
+* detail_dilate: The edge of the mask expands outward. the larger the value, the wider the range of outward repair.
+* black_point: Edge black sampling threshold.
+* white_point: Edge white sampling threshold.
+* process_detail: Set to false here will skip edge processing to save runtime.
+* device: Set whether the VitMatte to use cuda.
+* max_megapixels: Set the maximum size for VitMate operations.
+
+### <a id="table1">SAM2VideoUltra</a>
+SAM2 Video Ultra node support processing multiple frames of images or video sequences. Please define the recognition box data in the first frame of the sequence to ensure correct recognition.
+
+
+https://github.com/user-attachments/assets/4726b8bf-9b98-4630-8f54-cb7ed7a3d2c5
+
+
+Node Options: 
+![image](image/sam2_video_ultra_node.jpg) 
+
+* image: The image to segment.
+* bboxes: Input recognition box data.
+* pre_mask: Optional input mask, which will serve as a focus range limitation and help improve recognition accuracy.
+* sam2_model: Select the SAM2 model.
+* presicion: Model's persicion. can be selected from fp16, bf16, and fp32.
+* cache_model: Whether to cache the model. After caching the model, it will save time for model loading.
+* individual_object: When set to True, it will focus on identifying a single object. When set to False, attempts will be made to generate recognition boxes for multiple objects.
+* mask_preview_color: Display the color of non masked areas in the preview output. 
+* detail_method: Edge processing methods. Only VITMatte method can be used.
+* detail_erode: Mask the erosion range inward from the edge. the larger the value, the larger the range of inward repair.
+* detail_dilate: The edge of the mask expands outward. the larger the value, the wider the range of outward repair.
+* black_point: Edge black sampling threshold.
+* white_point: Edge white sampling threshold.
+* process_detail: Set to false here will skip edge processing to save runtime.
+* device: Only cuda can be used.
+* max_megapixels: Set the maximum size for VitMate operations.A larger size will result in finer mask edges, but it will lead to a significant decrease in computation speed.
+
+### <a id="table1">ObjectDetectorFL2</a>
+Use the Florence2 model to identify objects in images and output recognition box data. 
+*Download models from [BaiduNetdisk](https://pan.baidu.com/s/1hzw9-QiU1vB8pMbBgofZIA?pwd=mfl3) and copy to ```ComfyUI/models/florence2``` folder.
+
+Node Options: 
+![image](image/object_detector_fl2_node.jpg) 
+* image: The image to segment.
+* florence2_model: Florence2 model, it from [LoadFlorence2Model](#LoadFlorence2Model) node.
+* prompt: Describe the object that needs to be identified. 
+* bbox_select: Select the input box data. There are three options: "all" to select all, "first" to select the box with the highest confidence, and "by_index" to specify the index of the box.
+* select_index: This option is valid when bbox_delect is 'by_index'. 0 is the first one. Multiple values can be entered, separated by any non numeric character, including but not limited to commas, periods, semicolons, spaces or letters, and even Chinese.
+
+### <a id="table1">ObjectDetectorYOLOWorld</a>
+Use the YOLO-World model to identify objects in images and output recognition box data. 
+*Download models from [BaiduNetdisk](https://pan.baidu.com/s/1QpjajeTA37vEAU2OQnbDcQ?pwd=nqsk) or [GoogleDrive](https://drive.google.com/drive/folders/1nrsfq4S-yk9ewJgwrhXAoNVqIFLZ1at7?usp=sharing) and copy to ```ComfyUI/models/yolo-world``` folder.
+
+Node Options: 
+![image](image/object_detector_yolo_world_node.jpg) 
+* image: The image to segment.
+* confidence_threshold: The threshold of confidence.
+* nms_iou_threshold: The threshold of Non-Maximum Suppression.
+* prompt: Describe the object that needs to be identified.
+* bbox_select: Select the input box data. There are three options: "all" to select all, "first" to select the box with the highest confidence, and "by_index" to specify the index of the box.
+* select_index: This option is valid when bbox_delect is 'by_index'. 0 is the first one. Multiple values can be entered, separated by any non numeric character, including but not limited to commas, periods, semicolons, spaces or letters, and even Chinese.
+
+### <a id="table1">ObjectDetectorYOLO8</a>
+Use the YOLO-8 model to identify objects in images and output recognition box data. 
+*Download models from [GoogleDrive](https://drive.google.com/drive/folders/1I5TISO2G1ArSkKJu1O9b4Uvj3DVgn5d2) or [BaiduNetdisk](https://pan.baidu.com/s/1pEY6sjABQaPs6QtpK0q6XA?pwd=grqe) and copy to ```ComfyUI/models/yolo``` folder.
+
+Node Options: 
+![image](image/object_detector_yolo8_node.jpg)
+* image: The image to segment.
+* yolo_model: Choose the yolo model.
+* bbox_select: Select the input box data. There are three options: "all" to select all, "first" to select the box with the highest confidence, and "by_index" to specify the index of the box.
+* select_index: This option is valid when bbox_delect is 'by_index'. 0 is the first one. Multiple values can be entered, separated by any non numeric character, including but not limited to commas, periods, semicolons, spaces or letters, and even Chinese.
+
+### <a id="table1">ObjectDetectorMask</a>
+Use mask as recognition box data. All areas surrounded by white areas on the mask will be recognized as an object. Multiple enclosed areas will be identified separately. 
+
+Node Options: 
+![image](image/object_detector_mask_node.jpg)
+* object_mask: The mask input.
+* bbox_select: Select the input box data. There are three options: "all" to select all, "first" to select the box with the highest confidence, and "by_index" to specify the index of the box.
+* select_index: This option is valid when bbox_delect is 'by_index'. 0 is the first one. Multiple values can be entered, separated by any non numeric character, including but not limited to commas, periods, semicolons, spaces or letters, and even Chinese.
+
+### <a id="table1">BBoxJoin</a>
+Merge recognition box data. 
+
+Node Options: 
+![image](image/bbox_join_node.jpg)
+* bboxes_1: Required input. The first set of identification boxes.
+* bboxes_2: Optional input. The second set of identification boxes.
+* bboxes_3: Optional input. The third set of identification boxes.
+* bboxes_4: Optional input. The fourth set of identification boxes.
+
 ### <a id="table1">EVF-SAMUltra</a>
 This node is implementation of [EVF-SAM](https://github.com/hustvl/EVF-SAM) in ComfyUI. 
 *Please download model files from [BaiduNetdisk](https://pan.baidu.com/s/1EvaxgKcCxUpMbYKzLnEx9w?pwd=69bn) or [huggingface/EVF-SAM2](https://huggingface.co/YxZhang/evf-sam2/tree/main), [huggingface/EVF-SAM](https://huggingface.co/YxZhang/evf-sam/tree/main) to ```ComfyUI/models/EVF-SAM``` folder(save the models in their respective subdirectories).