Nota-NetsPresso · illian01 · Dec 17, 2024 · Nov 22, 2024 · Nov 22, 2024 · Nov 22, 2024
@@ -10,15 +10,15 @@ The current postprocessor is automatically determined based on the task name and
 
 ### Classification
 
-For classification, we don't any postprocessor settings yet.
+For classification, we don't have any postprocessor settings yet.
 
 ```yaml
 postprocessor: ~
 ```
 
 ### Segmentation
 
-For segmentation, we don't any postprocessor settings yet.
+For segmentation, we don't have any postprocessor settings yet.
 
 ```yaml
 postprocessor: ~
@@ -39,3 +39,32 @@ postprocessor:
     nms_thresh: 0.65
     class_agnostic: False
 ```
+
+#### YOLOFastestV2
+
+YOLOFastestV2 performs box decoding and NMS (Non-Maximum-Suppression) on its output predictions. The necessary hyperparameters for these processes are set as follows:
+
+```yaml
+postprocessor:
+  params:
+    # postprocessor - decode
+    score_thresh: 0.01
+    # postprocessor - nms
+    nms_thresh: 0.65
+    anchors:
+      &anchors
+      - [12.,18., 37.,49., 52.,132.]  # P2
+      - [115.,73., 119.,199., 242.,238.]  # P3
+    class_agnostic: False
+```
+
+#### RT-DETR
+
+RT-DETR exclusively performs box decoding operations on its output predictions, distinguishing itself through its NMS-free design. Meanwhile, bipartite matching during training ensures one-to-one predictions, eliminating the need for non-maximum suppression (NMS) in the postprocessing stage. The necessary hyperparameters for the process are set as follows:
+
+```yaml
+postprocessor:
+  params:
+    num_top_queries: 300
+    score_thresh: 0.01
+```
@@ -13,10 +13,6 @@ We have named the detection head of RetinaNet as AnchorDecoupledHead to represen
 | `params.aspect_ratios` | (list[float]) List of aspect ratio for each anchor. |
 | `params.num_layers` | (int) The number of convolution layers of regression and classification head. |
 | `params.norm_layer` | (str) Normalization type for the head. |
-| `params.topk_candidates` | (int) The number of boxes to retain based on score during the decoding step. |
-| `params.score_thresh` | (float) Score thresholding value applied during the decoding step. |
-| `params.nms_thresh` | (float) IoU threshold for non-maximum suppression. |
-| `params.class_agnostic` | (bool) Whether to process class-agnostic non-maximum suppression. |
 
 ## Model configuration example
 
@@ -32,13 +28,7 @@ We have named the detection head of RetinaNet as AnchorDecoupledHead to represen
           anchor_sizes: [[32,], [64,], [128,], [256,]]
           aspect_ratios: [0.5, 1.0, 2.0]
           num_layers: 1
-          norm_type: batch_norm
-          # postprocessor - decode
-          topk_candidates: 1000
-          score_thresh: 0.05
-          # postprocessor - nms
-          nms_thresh: 0.45
-          class_agnostic: False
+          norm_type: batch_norm 
   ```
 </details>
 

@@ -11,8 +11,6 @@ We provide the head of YOLOX as AnchorFreeDecoupledHead. There are no differnece
 | `name` | (str) Name must be "yolox_head" to use `YOLOX` head. |
 | `params.act_type` | (float) Activation function for the head. |
 | `params.depthwise`| (bool) Whether to enable depthwise convolution for the head. |
-| `params.score_thresh` | (float) Score thresholding value applied during the decoding step. |
-| `params.class_agnostic` | (bool) Whether to process class-agnostic non-maximum suppression. |
 
 ## Model configuration example
 
@@ -26,12 +24,7 @@ We provide the head of YOLOX as AnchorFreeDecoupledHead. There are no differnece
         name: anchor_free_decoupled_head
         params:
           depthwise: False
-          act_type: "silu"
-          # postprocessor - decode
-          score_thresh: 0.7
-          # postprocessor - nms
-          nms_thresh: 0.45
-          class_agnostic: False
+          act_type: "silu" 
   ```
 </details>
 

@@ -0,0 +1,58 @@
+# RT-DETR Head
+RT-DETR detection head based on [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069).
+
+We provide the head of RT-DETR as `rtdetr_head`. 
+
+## Field list
+
+| Field <img width=200/> | Description |
+|---|---|
+| `name` | (str) Name must be "rtdetr_head" to use `RT-DETR Head` head. |
+| `params.hidden_dim` | (int) Hidden dimension size, default is 256 according to paper's Appendix Table A |
+| `params.num_attention_heads` | (int) Number of attention heads, default is 8 according to paper's Appendix Table A |
+| `params.num_levels` | (int) Number of feature levels used, default is 3 according to paper's Section 4.1 |
+| `params.num_queries` | (int) Number of object queries, default is 300 according to paper's Section 4.1 and Appendix Table A |
+| `params.eps` | (float) Small constant for numerical stability, default is 1e-2 |
+| `params.num_decoder_layers` | (int) Number of decoder layers. |
+| `params.position_embed_type` | (str) Type of position embedding used ['sine', 'learned']. |
+| `params.num_decoder_points` | (int) Number of decoder reference points, default is 4 according to paper's Appendix Table A. |
+| `params.dim_feedforward` | (int) Feedforward network dimension, default is 1024 according to paper's Appendix Table A. |
+| `params.dropout` | (float) Dropout rate in layers. |
+| `params.act_type` | (str) Activation function type. |
+| `params.num_denoising` | (int) Number of denoising queries. |
+| `params.label_noise_ratio` | (float) Label noise ratio for denoising training, default is 0.5 according to paper's Appendix Table A. |
+| `params.use_aux_loss` | (bool) Whether to use auxiliary loss when training. The paper mentions using auxiliary prediction heads in Section 4.1. |
+
+## Model configuration example
+
+<details>
+  <summary>RT-DETR head</summary>
+
+  ```yaml
+  model:
+    architecture:
+      head:
+        name: rtdetr_head
+      params:
+        hidden_dim: 256
+        num_attention_heads: 8
+        num_levels: 3
+        num_queries: 300
+        eps: 1e-2
+        num_decoder_layers: 3
+        eval_spatial_size: ~
+        position_embed_type: sine
+        num_decoder_points: 4
+        dim_feedforward: 1024
+        dropout: 0.0
+        act_type: relu
+        num_denoising: 100
+        label_noise_ratio: 0.5
+        use_aux_loss: true
+  ```
+</details>
+
+## Related links
+
+- [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069) 
+- [lyuwenyu/RT-DETR](https://github.com/lyuwenyu/RT-DETR)
@@ -4,35 +4,6 @@ FPN based on [Feature Pyramid Networks for Object Detection](https://openaccess.
 
 The Feature Pyramid Network (FPN) is designed to enhance feature maps given from the backbone, typically used for detection models. Therefore, we also recommend to use it in detection task as well. FPN can create more pyramid deeply than the input feature pyramid from backbone, and in such cases, additional convolution or pooling layers are added.
 
-## Compatibility matrix
-
-<table>
-  <tr>
-    <th>Supporting backbones</th>
-    <th>Supporting heads</th>
-    <th>torch.fx</th>
-    <th>NetsPresso</th>
-  </tr>
-  <tr>
-    <td>
-      ResNet<br />
-      MobileNetV3<br />
-      MixNet<br />
-      CSPDarkNet<br />
-      MobileViT<br />
-      MixTransformer<br />
-      EfficientFormer
-    </td>
-    <td>
-      ALLMLPDecoder<br />
-      AnchorDecoupledHead<br />
-      AnchorFreeDecoupledHead
-    </td>
-    <td>Supported</td>
-    <td>Supported</td>
-  </tr>
-</table>
-
 ## Field list
 
 | Field <img width=200/> | Description |

@@ -0,0 +1,44 @@
+# RT-DETR Hybrid Encoder
+
+RT-DETR Hybrid Encoder based on [RT-DETR: DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069)
+
+
+
+## Field lists
+| Field <img width=200/> | Description |
+|---|---|
+| `name` | (str) Name must be "rtdetr_hybrid_encoder" to use RT-DETR Hybrid Encoder. |
+| `params.hidden_dim` | (int) Hidden dimension size, default is 256 according to paper's Appendix Table A | 
+| `params.use_encoder_idx` | (list) Index indicating which feature level to apply encoder. Default is [2] since paper's Section 4.2 mentions AIFI only performed on S5 (highest level) |
+| `params.num_encoder_layers` | (int) Number of encoder layers. |
+| `params.pe_temperature` | (float) Temperature for positional encoding |
+| `params.num_attention_heads` | (int) Number of attention heads. |
+| `params.dim_feedforward` | (int) Dimension of feedforward network. |
+| `params.dropout` | (float) Dropout rate, default is 0.0 according to configuration |
+| `params.attn_act_type` | (str) Activation function type for attention, using GELU |
+| `params.expansion` | (float) Expansion ratio for RepBlock in CCFF module, default is 0.5 |
+| `params.depth_mult` | (float) Depth multiplier for scaling. |
+| `params.conv_act_type` | (str) Activation function type for convolution layers, using SiLU according to paper's Figure 4. |
+
+
+## Model configuration examples
+
+<details>
+  <summary>RT-DETR Hybrid Encoder</summary>
+
+  ```yaml
+  model:
+    architecture:
+      neck:
+        name: fpn
+        params:
+          num_outs: 4
+          start_level: 0
+          end_level: -1
+          add_extra_convs: False
+          relu_before_extra_convs: False
+  ```
+</details>
+
+## Related links
+
@@ -4,35 +4,6 @@ YOLOPAFPN based on [YOLOX: Exceeding YOLO Series in 2021](https://arxiv.org/abs/
 
 YOLOPAFPN is a modified PAFPN for YOLOX model. Therefore, although YOLOPAFP is compatible with various backbones, we recommend to use it when constructing YOLOX models. The size is determined by `dep_mul` value, which defines the repetition of CSPLayers.
 
-## Compatibility matrix
-
-<table>
-  <tr>
-    <th>Supporting backbones</th>
-    <th>Supporting heads</th>
-    <th>torch.fx</th>
-    <th>NetsPresso</th>
-  </tr>
-  <tr>
-    <td>
-      ResNet<br />
-      MobileNetV3<br />
-      MixNet<br />
-      CSPDarkNet<br />
-      MobileViT<br />
-      MixTransformer<br />
-      EfficientFormer
-    </td>
-    <td>
-      ALLMLPDecoder<br />
-      AnchorDecoupledHead<br />
-      AnchorFreeDecoupledHead
-    </td>
-    <td>Supported</td>
-    <td>Supported</td>
-  </tr>
-</table>
-
 ## Field list
 
 | Field <img width=200/> | Description |