diff --git a/CHANGELOG.md b/CHANGELOG.md
index 0dbdc020d..212e6527b 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -20,6 +20,7 @@ No changes to highlight.
- Refactor RT-DETR and generalize CSPRepLayer and RepVGG block by `@hglee98` in [PR 581](https://github.com/Nota-NetsPresso/netspresso-trainer/pull/581), [PR 594](https://github.com/Nota-NetsPresso/netspresso-trainer/pull/594)
- Generalize 2d pooling layers and define as custom layer by `@hglee98` in [PR 583](https://github.com/Nota-NetsPresso/netspresso-trainer/pull/583)
- Unify bbox transformation and IoU computing methods by `@hglee98` in [PR 587](https://github.com/Nota-NetsPresso/netspresso-trainer/pull/587)
+- Update documents by `@hglee98` in [PR 591](https://github.com/Nota-NetsPresso/netspresso-trainer/pull/591)
# v1.0.3
diff --git a/docs/components/model/postprocessors.md b/docs/components/model/postprocessors.md
index d85f60795..1f16cb546 100644
--- a/docs/components/model/postprocessors.md
+++ b/docs/components/model/postprocessors.md
@@ -10,7 +10,7 @@ The current postprocessor is automatically determined based on the task name and
### Classification
-For classification, we don't any postprocessor settings yet.
+For classification, we don't have any postprocessor settings yet.
```yaml
postprocessor: ~
@@ -18,7 +18,7 @@ postprocessor: ~
### Segmentation
-For segmentation, we don't any postprocessor settings yet.
+For segmentation, we don't have any postprocessor settings yet.
```yaml
postprocessor: ~
@@ -39,3 +39,32 @@ postprocessor:
nms_thresh: 0.65
class_agnostic: False
```
+
+#### YOLOFastestV2
+
+YOLOFastestV2 performs box decoding and NMS (Non-Maximum-Suppression) on its output predictions. The necessary hyperparameters for these processes are set as follows:
+
+```yaml
+postprocessor:
+ params:
+ # postprocessor - decode
+ score_thresh: 0.01
+ # postprocessor - nms
+ nms_thresh: 0.65
+ anchors:
+ &anchors
+ - [12.,18., 37.,49., 52.,132.] # P2
+ - [115.,73., 119.,199., 242.,238.] # P3
+ class_agnostic: False
+```
+
+#### RT-DETR
+
+RT-DETR exclusively performs box decoding operations on its output predictions, distinguishing itself through its NMS-free design. Meanwhile, bipartite matching during training ensures one-to-one predictions, eliminating the need for non-maximum suppression (NMS) in the postprocessing stage. The necessary hyperparameters for the process are set as follows:
+
+```yaml
+postprocessor:
+ params:
+ num_top_queries: 300
+ score_thresh: 0.01
+```
diff --git a/docs/models/heads/anchordecoupledhead.md b/docs/models/heads/anchordecoupledhead.md
index 635a281b3..4dbf90e4c 100644
--- a/docs/models/heads/anchordecoupledhead.md
+++ b/docs/models/heads/anchordecoupledhead.md
@@ -13,10 +13,6 @@ We have named the detection head of RetinaNet as AnchorDecoupledHead to represen
| `params.aspect_ratios` | (list[float]) List of aspect ratio for each anchor. |
| `params.num_layers` | (int) The number of convolution layers of regression and classification head. |
| `params.norm_layer` | (str) Normalization type for the head. |
-| `params.topk_candidates` | (int) The number of boxes to retain based on score during the decoding step. |
-| `params.score_thresh` | (float) Score thresholding value applied during the decoding step. |
-| `params.nms_thresh` | (float) IoU threshold for non-maximum suppression. |
-| `params.class_agnostic` | (bool) Whether to process class-agnostic non-maximum suppression. |
## Model configuration example
@@ -32,13 +28,7 @@ We have named the detection head of RetinaNet as AnchorDecoupledHead to represen
anchor_sizes: [[32,], [64,], [128,], [256,]]
aspect_ratios: [0.5, 1.0, 2.0]
num_layers: 1
- norm_type: batch_norm
- # postprocessor - decode
- topk_candidates: 1000
- score_thresh: 0.05
- # postprocessor - nms
- nms_thresh: 0.45
- class_agnostic: False
+ norm_type: batch_norm
```
diff --git a/docs/models/heads/anchorfreedecoupledhead.md b/docs/models/heads/anchorfreedecoupledhead.md
index 297c410a6..c5bb04814 100644
--- a/docs/models/heads/anchorfreedecoupledhead.md
+++ b/docs/models/heads/anchorfreedecoupledhead.md
@@ -11,8 +11,6 @@ We provide the head of YOLOX as AnchorFreeDecoupledHead. There are no differnece
| `name` | (str) Name must be "yolox_head" to use `YOLOX` head. |
| `params.act_type` | (float) Activation function for the head. |
| `params.depthwise`| (bool) Whether to enable depthwise convolution for the head. |
-| `params.score_thresh` | (float) Score thresholding value applied during the decoding step. |
-| `params.class_agnostic` | (bool) Whether to process class-agnostic non-maximum suppression. |
## Model configuration example
@@ -26,12 +24,7 @@ We provide the head of YOLOX as AnchorFreeDecoupledHead. There are no differnece
name: anchor_free_decoupled_head
params:
depthwise: False
- act_type: "silu"
- # postprocessor - decode
- score_thresh: 0.7
- # postprocessor - nms
- nms_thresh: 0.45
- class_agnostic: False
+ act_type: "silu"
```
diff --git a/docs/models/heads/rtdetrhead.md b/docs/models/heads/rtdetrhead.md
new file mode 100644
index 000000000..a2c9f55bd
--- /dev/null
+++ b/docs/models/heads/rtdetrhead.md
@@ -0,0 +1,58 @@
+# RT-DETR Head
+RT-DETR detection head based on [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069).
+
+We provide the head of RT-DETR as `rtdetr_head`.
+
+## Field list
+
+| Field | Description |
+|---|---|
+| `name` | (str) Name must be "rtdetr_head" to use `RT-DETR Head` head. |
+| `params.hidden_dim` | (int) Hidden dimension size, default is 256 according to paper's Appendix Table A |
+| `params.num_attention_heads` | (int) Number of attention heads, default is 8 according to paper's Appendix Table A |
+| `params.num_levels` | (int) Number of feature levels used, default is 3 according to paper's Section 4.1 |
+| `params.num_queries` | (int) Number of object queries, default is 300 according to paper's Section 4.1 and Appendix Table A |
+| `params.eps` | (float) Small constant for numerical stability, default is 1e-2 |
+| `params.num_decoder_layers` | (int) Number of decoder layers. |
+| `params.position_embed_type` | (str) Type of position embedding used ['sine', 'learned']. |
+| `params.num_decoder_points` | (int) Number of decoder reference points, default is 4 according to paper's Appendix Table A. |
+| `params.dim_feedforward` | (int) Feedforward network dimension, default is 1024 according to paper's Appendix Table A. |
+| `params.dropout` | (float) Dropout rate in layers. |
+| `params.act_type` | (str) Activation function type. |
+| `params.num_denoising` | (int) Number of denoising queries. |
+| `params.label_noise_ratio` | (float) Label noise ratio for denoising training, default is 0.5 according to paper's Appendix Table A. |
+| `params.use_aux_loss` | (bool) Whether to use auxiliary loss when training. The paper mentions using auxiliary prediction heads in Section 4.1. |
+
+## Model configuration example
+
+RT-DETR head
+
+ ```yaml
+ model:
+ architecture:
+ head:
+ name: rtdetr_head
+ params:
+ hidden_dim: 256
+ num_attention_heads: 8
+ num_levels: 3
+ num_queries: 300
+ eps: 1e-2
+ num_decoder_layers: 3
+ eval_spatial_size: ~
+ position_embed_type: sine
+ num_decoder_points: 4
+ dim_feedforward: 1024
+ dropout: 0.0
+ act_type: relu
+ num_denoising: 100
+ label_noise_ratio: 0.5
+ use_aux_loss: true
+ ```
+
Supporting backbones | -Supporting heads | -torch.fx | -NetsPresso | -
---|---|---|---|
- ResNet - MobileNetV3 - MixNet - CSPDarkNet - MobileViT - MixTransformer - EfficientFormer - |
-
- ALLMLPDecoder - AnchorDecoupledHead - AnchorFreeDecoupledHead - |
- Supported | -Supported | -
Supporting backbones | -Supporting heads | -torch.fx | -NetsPresso | -
---|---|---|---|
- ResNet - MobileNetV3 - MixNet - CSPDarkNet - MobileViT - MixTransformer - EfficientFormer - |
-
- ALLMLPDecoder - AnchorDecoupledHead - AnchorFreeDecoupledHead - |
- Supported | -Supported | -