commit DrawBBoxMask node

chflame163 · Sep 21, 2024 · c6083de · c6083de
1 parent a7437cc
commit c6083de
Show file tree

Hide file tree

Showing 7 changed files with 360 additions and 6 deletions.
diff --git a/README.MD b/README.MD
@@ -116,7 +116,7 @@ When this error has occurred, please check the network environment.
 ## Update
 <font size="4">**If the dependency package error after updating, please double clicking ```repair_dependency.bat``` (for Official ComfyUI Protable) or ```repair_dependency_aki.bat``` (for ComfyUI-aki-v1.x) in the plugin folder to reinstall the dependency packages. </font><br /> 
 
-
+* Commit [DrawBBoxMask](#DrawBBoxMask) node, used to convert the BBoxes output by the Object Detector node into a mask.
 * Commit [UserPromptGeneratorTxtImg](#UserPromptGeneratorTxtImg) and [UserPromptGeneratorReplaceWord](#UserPromptGeneratorReplaceWord) nodes, Used to generate text and image prompts and replace prompt content.
 * Commit [PhiPrompt](#PhiPrompt) node, Use Microsoft Phi 3.5 text and visual models for local inference. Can be used to generate prompt words, process prompt words, or infer prompt words from images. Running this model requires at least 16GB of video memory. 
 Download model files from [BaiduNetdisk](https://pan.baidu.com/s/1BdTLdaeGC3trh1U3V-6XTA?pwd=29dh) or [huggingface.co/microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct/tree/main) and [huggingface.co/microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/tree/main) and copy to ```ComfyUI\models\LLM``` folder.
@@ -1836,6 +1836,19 @@ Node Options:
 * bboxes_3: Optional input. The third set of identification boxes.
 * bboxes_4: Optional input. The fourth set of identification boxes.
 
+### <a id="table1">DrawBBoxMask</a>
+Draw the recognition BBoxes data output by the Object Detector node as a mask. 
+![image](image/draw_bbox_mask_example.jpg)
+
+Node Options: 
+![image](image/draw_bbox_mask_node.jpg)
+* image: Image input. It must be consistent with the image recognized by the Object Detector node. 
+* bboxes: Input recognition BBoxes data.
+* grow_top: Each BBox expands upwards as a percentage of its height, positive values indicate upward expansion and negative values indicate downward expansion.
+* grow_bottom: Each BBox expands downwards as a percentage of its height, positive values indicating downward expansion and negative values indicating upward expansion.
+* grow_left: Each BBox expands to the left as a percentage of its width, positive values expand to the left and negative values expand to the right.
+* grow_right: Each BBox expands to the right as a percentage of its width, positive values indicate expansion to the right and negative values indicate expansion to the left.
+
 ### <a id="table1">EVF-SAMUltra</a>
 This node is implementation of [EVF-SAM](https://github.com/hustvl/EVF-SAM) in ComfyUI. 
 *Please download model files from [BaiduNetdisk](https://pan.baidu.com/s/1EvaxgKcCxUpMbYKzLnEx9w?pwd=69bn) or [huggingface/EVF-SAM2](https://huggingface.co/YxZhang/evf-sam2/tree/main), [huggingface/EVF-SAM](https://huggingface.co/YxZhang/evf-sam/tree/main) to ```ComfyUI/models/EVF-SAM``` folder(save the models in their respective subdirectories).

diff --git a/README_CN.MD b/README_CN.MD
@@ -116,6 +116,7 @@ os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
 ## 更新说明
 <font size="4">**如果本插件更新后出现依赖包错误，请双击运行插件目录下的```install_requirements.bat```(官方便携包)，或 ```install_requirements_aki.bat```(秋叶整合包) 重新安装依赖包。
 
+* 添加 [DrawBBoxMask](#DrawBBoxMask) 节点，用于将 ObjectDetector 节点输出的BBox转为遮罩。
 * 添加 [UserPromptGeneratorTxtImg](#UserPromptGeneratorTxtImg) 以及 [UserPromptGeneratorReplaceWord](#UserPromptGeneratorReplaceWord) 节点, 用于生成文生图提示词和替换提示词内容。
 * 添加 [PhiPrompt](#PhiPrompt) 节点，使用Micrisoft Phi 3.5文字及视觉模型进行本地推理。可以用于生成提示词，加工提示词或者反推图片的提示词。运行这个模型需要至少16GB的显存。 
 请从[百度网盘](https://pan.baidu.com/s/1BdTLdaeGC3trh1U3V-6XTA?pwd=29dh) 或者 [huggingface.co/microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct/tree/main) 和 [huggingface.co/microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/tree/main) 下载全部模型文件并放到 ```ComfyUI\models\LLM``` 文件夹。
@@ -1807,6 +1808,20 @@ https://github.com/user-attachments/assets/b2a45c96-4be1-4470-8ceb-addaf301b0cb
 * bboxes_3: 可选输入。第三组识别框。
 * bboxes_4: 可选输入。第四组识别框。
 
+### <a id="table1">DrawBBoxMask</a>
+将ObjectDetector节点输出的识别框数据绘制为遮罩。 
+![image](image/draw_bbox_mask_example.jpg)
+
+节点选项说明: 
+![image](image/draw_bbox_mask_node.jpg)
+* image: 图片输入。必须与ObjectDetector节点识别的图片一致。
+* bboxes: 识别框数据输入。
+* grow_top: 每个识别框向上扩展范围，为识别框高度的百分比。正值为向上扩展，负值为向下扩展。
+* grow_bottom: 每个识别框向下扩展范围，为识别框高度的百分比，正值为向下扩展，负值为向上扩展。
+* grow_left: 每个识别框向左扩展范围，为识别框宽度的百分比。正值为向左扩展，负值为向右扩展。
+* grow_right: 每个识别框向右扩展范围，为识别框宽度的百分比。正值为向右扩展，负值为向左扩展。
+
+
 ### <a id="table1">EVF-SAMUltra</a>
 本节点是[EVF-SAM](https://github.com/hustvl/EVF-SAM)在ComfyUI中的实现。 
 *请从[百度网盘](https://pan.baidu.com/s/1EvaxgKcCxUpMbYKzLnEx9w?pwd=69bn) 或者 [huggingface/EVF-SAM2](https://huggingface.co/YxZhang/evf-sam2/tree/main), [huggingface/EVF-SAM](https://huggingface.co/YxZhang/evf-sam/tree/main) 下载全部模型文件并复制到```ComfyUI/models/EVF-SAM```文件夹(请将模型保存在各自子目录中)。

diff --git a/image/draw_bbox_mask_example.jpg b/image/draw_bbox_mask_example.jpg
diff --git a/image/draw_bbox_mask_node.jpg b/image/draw_bbox_mask_node.jpg
diff --git a/py/object_detector.py b/py/object_detector.py
@@ -4,6 +4,18 @@
 select_list = ["all", "first", "by_index"]
 sort_method_list = ["left_to_right", "top_to_bottom", "big_to_small"]
 
+
+# 规范bbox，保证x1 < x2, y1 < y2, 并返回int
+def standardize_bbox(bboxes:list) -> list:
+ ret_bboxes = []
+ for bbox in bboxes:
+ x1 = int(min(bbox[0], bbox[2]))
+ y1 = int(min(bbox[1], bbox[3]))
+ x2 = int(max(bbox[0], bbox[2]))
+ y2 = int(max(bbox[1], bbox[3]))
+ ret_bboxes.append([x1, y1, x2, y2])
+ return ret_bboxes
+
 def sort_bboxes(bboxes:list, method:str) -> list:
  sorted_bboxes = []
  if method == "left_to_right":
@@ -121,7 +133,7 @@ def object_detector_fl2(self, image, prompt, florence2_model, sort_method, bbox_
  log(f"{self.NODE_NAME} no object found", message_type='warning')
  else:
  log(f"{self.NODE_NAME} found {len(bboxes)} object(s)", message_type='info')
- return (bboxes, torch.cat(ret_previews, dim=0))
+ return (standardize_bbox(bboxes), torch.cat(ret_previews, dim=0))
 
  def fbboxes_to_list(self, F_BBOXES) -> list:
  if isinstance(F_BBOXES, str):
@@ -220,7 +232,7 @@ def object_detector_mask(self, object_mask, sort_method, bbox_select, select_ind
  else:
  log(f"{self.NODE_NAME} found {len(bboxes)} object(s)", message_type='info')
 
- return (bboxes, torch.cat(ret_previews, dim=0))
+ return (standardize_bbox(bboxes), torch.cat(ret_previews, dim=0))
 
 
 class LS_OBJECT_DETECTOR_YOLO8:
@@ -281,7 +293,7 @@ def object_detector_yolo8(self, image, yolo_model, sort_method, bbox_select, sel
  else:
  log(f"{self.NODE_NAME} found {len(bboxes)} object(s)", message_type='info')
 
- return (bboxes, torch.cat(ret_previews, dim=0),)
+ return (standardize_bbox(bboxes), torch.cat(ret_previews, dim=0),)
 
 class LS_OBJECT_DETECTOR_YOLOWORLD:
 
@@ -344,7 +356,7 @@ def object_detector_yoloworld(self, image, yolo_world_model,
  else:
  log(f"{self.NODE_NAME} found {len(bboxes)} object(s)", message_type='info')
 
- return (bboxes, torch.cat(ret_previews, dim=0))
+ return (standardize_bbox(bboxes), torch.cat(ret_previews, dim=0))
 
  def process_categories(self, categories: str) -> List[str]:
  return [category.strip().lower() for category in categories.split(',')]
@@ -357,8 +369,67 @@ def load_yolo_world_model(self,model_id: str, categories: str) -> List[torch.nn.
  return model
 
 
+
+class LS_DrawBBoxMask:
+
+ def __init__(self):
+ self.NODE_NAME = 'Draw BBOX Mask'
+ pass
+
+ @classmethod
+ def INPUT_TYPES(cls):
+ return {
+ "required": {
+ "image": ("IMAGE",),
+ "bboxes": ("BBOXES",),
+ "grow_top": ("FLOAT", {"default": 0, "min": -10, "max": 10, "step": 0.01}), # bbox向上扩展，按高度比例
+ "grow_bottom": ("FLOAT", {"default": 0, "min": -10, "max": 10, "step": 0.01}),
+ "grow_left": ("FLOAT", {"default": 0, "min": -10, "max": 10, "step": 0.01}),
+ "grow_right": ("FLOAT", {"default": 0, "min": -10, "max": 10, "step": 0.01}),
+ },
+ "optional": {
+ }
+ }
+
+ RETURN_TYPES = ("MASK",)
+ RETURN_NAMES = ("mask",)
+ FUNCTION = 'draw_bbox_mask'
+ CATEGORY = '😺dzNodes/LayerMask'
+
+ def draw_bbox_mask(self, image, bboxes, grow_top, grow_bottom, grow_left, grow_right
+ ):
+
+ ret_masks = []
+ for img in image:
+ img = tensor2pil(img)
+ mask = Image.new("L", img.size, color='black')
+ for bbox in bboxes:
+ x1, y1, x2, y2 = bbox
+ w = x2 - x1
+ h = y2 - y1
+ if grow_top:
+ y1 = int(y1 - h * grow_top)
+ if grow_bottom:
+ y2 = int(y2 + h * grow_bottom)
+ if grow_left:
+ x1 = int(x1 - w * grow_left)
+ if grow_right:
+ x2 = int(x2 + w * grow_right)
+ if y1 > y2 or x1 > x2:
+ log(f"{self.NODE_NAME} Invalid bbox after extend: ({x1},{y1},{x2},{y2})", message_type='warning')
+ continue
+ draw = ImageDraw.Draw(mask)
+ draw.rectangle([x1, y1, x2, y2], fill='white', outline='white', width=0)
+ del draw
+ ret_masks.append(pil2tensor(mask))
+
+ log(f"{self.NODE_NAME} Processed {len(ret_masks)} mask(s).", message_type='finish')
+ return (torch.cat(ret_masks, dim=0),)
+
+
 NODE_CLASS_MAPPINGS = {
  "LayerMask: BBoxJoin": LS_BBOXES_JOIN,
+ "LayerMask: DrawBBoxMask": LS_DrawBBoxMask,
  "LayerMask: ObjectDetectorFL2": LS_OBJECT_DETECTOR_FL2,
  "LayerMask: ObjectDetectorMask": LS_OBJECT_DETECTOR_MASK,
  "LayerMask: ObjectDetectorYOLO8": LS_OBJECT_DETECTOR_YOLO8,
@@ -367,6 +438,7 @@ def load_yolo_world_model(self,model_id: str, categories: str) -> List[torch.nn.
 
 NODE_DISPLAY_NAME_MAPPINGS = {
  "LayerMask: BBoxJoin": "LayerMask: BBox Join",
+ "LayerMask: DrawBBoxMask": "LayerMask: Draw BBox Mask",
  "LayerMask: ObjectDetectorFL2": "LayerMask: Object Detector Florence2",
  "LayerMask: ObjectDetectorMask": "LayerMask: Object Detector Mask",
  "LayerMask: ObjectDetectorYOLO8": "LayerMask: Object Detector YOLO8",

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,7 +1,7 @@
 [project]
 name = "comfyui_layerstyle"
 description = "A set of nodes for ComfyUI it generate image like Adobe Photoshop's Layer Style. the Drop Shadow is first completed node, and follow-up work is in progress."
-version = "1.0.57"
+version = "1.0.58"
 license = "MIT"
 dependencies = ["numpy", "pillow", "torch", "matplotlib", "Scipy", "scikit_image", "scikit_learn", "opencv-contrib-python", "pymatting", "segment_anything", "timm", "addict", "yapf", "colour-science", "wget", "mediapipe", "loguru", "typer_config", "fastapi", "rich", "google-generativeai", "diffusers", "omegaconf", "tqdm", "transformers", "kornia", "image-reward", "ultralytics", "blend_modes", "blind-watermark", "qrcode", "pyzbar", "transparent-background", "huggingface_hub", "accelerate", "bitsandbytes", "torchscale", "wandb", "hydra-core", "psd-tools", "inference-cli[yolo-world]", "inference-gpu[yolo-world]", "onnxruntime"]