diff --git a/CHANGELOG.md b/CHANGELOG.md index 0dbdc020d..212e6527b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -20,6 +20,7 @@ No changes to highlight. - Refactor RT-DETR and generalize CSPRepLayer and RepVGG block by `@hglee98` in [PR 581](https://github.com/Nota-NetsPresso/netspresso-trainer/pull/581), [PR 594](https://github.com/Nota-NetsPresso/netspresso-trainer/pull/594) - Generalize 2d pooling layers and define as custom layer by `@hglee98` in [PR 583](https://github.com/Nota-NetsPresso/netspresso-trainer/pull/583) - Unify bbox transformation and IoU computing methods by `@hglee98` in [PR 587](https://github.com/Nota-NetsPresso/netspresso-trainer/pull/587) +- Update documents by `@hglee98` in [PR 591](https://github.com/Nota-NetsPresso/netspresso-trainer/pull/591) # v1.0.3 diff --git a/docs/components/model/postprocessors.md b/docs/components/model/postprocessors.md index d85f60795..1f16cb546 100644 --- a/docs/components/model/postprocessors.md +++ b/docs/components/model/postprocessors.md @@ -10,7 +10,7 @@ The current postprocessor is automatically determined based on the task name and ### Classification -For classification, we don't any postprocessor settings yet. +For classification, we don't have any postprocessor settings yet. ```yaml postprocessor: ~ @@ -18,7 +18,7 @@ postprocessor: ~ ### Segmentation -For segmentation, we don't any postprocessor settings yet. +For segmentation, we don't have any postprocessor settings yet. ```yaml postprocessor: ~ @@ -39,3 +39,32 @@ postprocessor: nms_thresh: 0.65 class_agnostic: False ``` + +#### YOLOFastestV2 + +YOLOFastestV2 performs box decoding and NMS (Non-Maximum-Suppression) on its output predictions. The necessary hyperparameters for these processes are set as follows: + +```yaml +postprocessor: + params: + # postprocessor - decode + score_thresh: 0.01 + # postprocessor - nms + nms_thresh: 0.65 + anchors: + &anchors + - [12.,18., 37.,49., 52.,132.] # P2 + - [115.,73., 119.,199., 242.,238.] # P3 + class_agnostic: False +``` + +#### RT-DETR + +RT-DETR exclusively performs box decoding operations on its output predictions, distinguishing itself through its NMS-free design. Meanwhile, bipartite matching during training ensures one-to-one predictions, eliminating the need for non-maximum suppression (NMS) in the postprocessing stage. The necessary hyperparameters for the process are set as follows: + +```yaml +postprocessor: + params: + num_top_queries: 300 + score_thresh: 0.01 +``` diff --git a/docs/models/heads/anchordecoupledhead.md b/docs/models/heads/anchordecoupledhead.md index 635a281b3..4dbf90e4c 100644 --- a/docs/models/heads/anchordecoupledhead.md +++ b/docs/models/heads/anchordecoupledhead.md @@ -13,10 +13,6 @@ We have named the detection head of RetinaNet as AnchorDecoupledHead to represen | `params.aspect_ratios` | (list[float]) List of aspect ratio for each anchor. | | `params.num_layers` | (int) The number of convolution layers of regression and classification head. | | `params.norm_layer` | (str) Normalization type for the head. | -| `params.topk_candidates` | (int) The number of boxes to retain based on score during the decoding step. | -| `params.score_thresh` | (float) Score thresholding value applied during the decoding step. | -| `params.nms_thresh` | (float) IoU threshold for non-maximum suppression. | -| `params.class_agnostic` | (bool) Whether to process class-agnostic non-maximum suppression. | ## Model configuration example @@ -32,13 +28,7 @@ We have named the detection head of RetinaNet as AnchorDecoupledHead to represen anchor_sizes: [[32,], [64,], [128,], [256,]] aspect_ratios: [0.5, 1.0, 2.0] num_layers: 1 - norm_type: batch_norm - # postprocessor - decode - topk_candidates: 1000 - score_thresh: 0.05 - # postprocessor - nms - nms_thresh: 0.45 - class_agnostic: False + norm_type: batch_norm ``` diff --git a/docs/models/heads/anchorfreedecoupledhead.md b/docs/models/heads/anchorfreedecoupledhead.md index 297c410a6..c5bb04814 100644 --- a/docs/models/heads/anchorfreedecoupledhead.md +++ b/docs/models/heads/anchorfreedecoupledhead.md @@ -11,8 +11,6 @@ We provide the head of YOLOX as AnchorFreeDecoupledHead. There are no differnece | `name` | (str) Name must be "yolox_head" to use `YOLOX` head. | | `params.act_type` | (float) Activation function for the head. | | `params.depthwise`| (bool) Whether to enable depthwise convolution for the head. | -| `params.score_thresh` | (float) Score thresholding value applied during the decoding step. | -| `params.class_agnostic` | (bool) Whether to process class-agnostic non-maximum suppression. | ## Model configuration example @@ -26,12 +24,7 @@ We provide the head of YOLOX as AnchorFreeDecoupledHead. There are no differnece name: anchor_free_decoupled_head params: depthwise: False - act_type: "silu" - # postprocessor - decode - score_thresh: 0.7 - # postprocessor - nms - nms_thresh: 0.45 - class_agnostic: False + act_type: "silu" ``` diff --git a/docs/models/heads/rtdetrhead.md b/docs/models/heads/rtdetrhead.md new file mode 100644 index 000000000..a2c9f55bd --- /dev/null +++ b/docs/models/heads/rtdetrhead.md @@ -0,0 +1,58 @@ +# RT-DETR Head +RT-DETR detection head based on [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069). + +We provide the head of RT-DETR as `rtdetr_head`. + +## Field list + +| Field | Description | +|---|---| +| `name` | (str) Name must be "rtdetr_head" to use `RT-DETR Head` head. | +| `params.hidden_dim` | (int) Hidden dimension size, default is 256 according to paper's Appendix Table A | +| `params.num_attention_heads` | (int) Number of attention heads, default is 8 according to paper's Appendix Table A | +| `params.num_levels` | (int) Number of feature levels used, default is 3 according to paper's Section 4.1 | +| `params.num_queries` | (int) Number of object queries, default is 300 according to paper's Section 4.1 and Appendix Table A | +| `params.eps` | (float) Small constant for numerical stability, default is 1e-2 | +| `params.num_decoder_layers` | (int) Number of decoder layers. | +| `params.position_embed_type` | (str) Type of position embedding used ['sine', 'learned']. | +| `params.num_decoder_points` | (int) Number of decoder reference points, default is 4 according to paper's Appendix Table A. | +| `params.dim_feedforward` | (int) Feedforward network dimension, default is 1024 according to paper's Appendix Table A. | +| `params.dropout` | (float) Dropout rate in layers. | +| `params.act_type` | (str) Activation function type. | +| `params.num_denoising` | (int) Number of denoising queries. | +| `params.label_noise_ratio` | (float) Label noise ratio for denoising training, default is 0.5 according to paper's Appendix Table A. | +| `params.use_aux_loss` | (bool) Whether to use auxiliary loss when training. The paper mentions using auxiliary prediction heads in Section 4.1. | + +## Model configuration example + +
+ RT-DETR head + + ```yaml + model: + architecture: + head: + name: rtdetr_head + params: + hidden_dim: 256 + num_attention_heads: 8 + num_levels: 3 + num_queries: 300 + eps: 1e-2 + num_decoder_layers: 3 + eval_spatial_size: ~ + position_embed_type: sine + num_decoder_points: 4 + dim_feedforward: 1024 + dropout: 0.0 + act_type: relu + num_denoising: 100 + label_noise_ratio: 0.5 + use_aux_loss: true + ``` +
+ +## Related links + +- [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069) +- [lyuwenyu/RT-DETR](https://github.com/lyuwenyu/RT-DETR) \ No newline at end of file diff --git a/docs/models/necks/fpn.md b/docs/models/necks/fpn.md index 95aa99d43..870d2d416 100644 --- a/docs/models/necks/fpn.md +++ b/docs/models/necks/fpn.md @@ -4,35 +4,6 @@ FPN based on [Feature Pyramid Networks for Object Detection](https://openaccess. The Feature Pyramid Network (FPN) is designed to enhance feature maps given from the backbone, typically used for detection models. Therefore, we also recommend to use it in detection task as well. FPN can create more pyramid deeply than the input feature pyramid from backbone, and in such cases, additional convolution or pooling layers are added. -## Compatibility matrix - - - - - - - - - - - - - - -
Supporting backbonesSupporting headstorch.fxNetsPresso
- ResNet
- MobileNetV3
- MixNet
- CSPDarkNet
- MobileViT
- MixTransformer
- EfficientFormer -
- ALLMLPDecoder
- AnchorDecoupledHead
- AnchorFreeDecoupledHead -
SupportedSupported
- ## Field list | Field | Description | diff --git a/docs/models/necks/rtdetrhybridencoder.md b/docs/models/necks/rtdetrhybridencoder.md new file mode 100644 index 000000000..510d686fb --- /dev/null +++ b/docs/models/necks/rtdetrhybridencoder.md @@ -0,0 +1,44 @@ +# RT-DETR Hybrid Encoder + +RT-DETR Hybrid Encoder based on [RT-DETR: DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069) + + + +## Field lists +| Field | Description | +|---|---| +| `name` | (str) Name must be "rtdetr_hybrid_encoder" to use RT-DETR Hybrid Encoder. | +| `params.hidden_dim` | (int) Hidden dimension size, default is 256 according to paper's Appendix Table A | +| `params.use_encoder_idx` | (list) Index indicating which feature level to apply encoder. Default is [2] since paper's Section 4.2 mentions AIFI only performed on S5 (highest level) | +| `params.num_encoder_layers` | (int) Number of encoder layers. | +| `params.pe_temperature` | (float) Temperature for positional encoding | +| `params.num_attention_heads` | (int) Number of attention heads. | +| `params.dim_feedforward` | (int) Dimension of feedforward network. | +| `params.dropout` | (float) Dropout rate, default is 0.0 according to configuration | +| `params.attn_act_type` | (str) Activation function type for attention, using GELU | +| `params.expansion` | (float) Expansion ratio for RepBlock in CCFF module, default is 0.5 | +| `params.depth_mult` | (float) Depth multiplier for scaling. | +| `params.conv_act_type` | (str) Activation function type for convolution layers, using SiLU according to paper's Figure 4. | + + +## Model configuration examples + +
+ RT-DETR Hybrid Encoder + + ```yaml + model: + architecture: + neck: + name: fpn + params: + num_outs: 4 + start_level: 0 + end_level: -1 + add_extra_convs: False + relu_before_extra_convs: False + ``` +
+ +## Related links + diff --git a/docs/models/necks/yolopafpn.md b/docs/models/necks/yolopafpn.md index 048d60fd5..d1adf383f 100644 --- a/docs/models/necks/yolopafpn.md +++ b/docs/models/necks/yolopafpn.md @@ -4,35 +4,6 @@ YOLOPAFPN based on [YOLOX: Exceeding YOLO Series in 2021](https://arxiv.org/abs/ YOLOPAFPN is a modified PAFPN for YOLOX model. Therefore, although YOLOPAFP is compatible with various backbones, we recommend to use it when constructing YOLOX models. The size is determined by `dep_mul` value, which defines the repetition of CSPLayers. -## Compatibility matrix - - - - - - - - - - - - - - -
Supporting backbonesSupporting headstorch.fxNetsPresso
- ResNet
- MobileNetV3
- MixNet
- CSPDarkNet
- MobileViT
- MixTransformer
- EfficientFormer -
- ALLMLPDecoder
- AnchorDecoupledHead
- AnchorFreeDecoupledHead -
SupportedSupported
- ## Field list | Field | Description |