Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] update rt-detr, yolofastestv2 and postprocessor config #591

Merged
merged 12 commits into from
Dec 17, 2024
33 changes: 31 additions & 2 deletions docs/components/model/postprocessors.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@ The current postprocessor is automatically determined based on the task name and

### Classification

For classification, we don't any postprocessor settings yet.
For classification, we don't have any postprocessor settings yet.

```yaml
postprocessor: ~
```

### Segmentation

For segmentation, we don't any postprocessor settings yet.
For segmentation, we don't have any postprocessor settings yet.

```yaml
postprocessor: ~
Expand All @@ -39,3 +39,32 @@ postprocessor:
nms_thresh: 0.65
class_agnostic: False
```

#### YOLOFastestV2

YOLOFastestV2 performs box decoding and NMS (Non-Maximum-Suppression) on its output predictions. The necessary hyperparameters for these processes are set as follows:

```yaml
postprocessor:
params:
# postprocessor - decode
score_thresh: 0.01
# postprocessor - nms
nms_thresh: 0.65
anchors:
&anchors
- [12.,18., 37.,49., 52.,132.] # P2
- [115.,73., 119.,199., 242.,238.] # P3
class_agnostic: False
```

#### RT-DETR

RT-DETR exclusively performs box decoding operations on its output predictions, distinguishing itself through its NMS-free design. Meanwhile, bipartite matching during training ensures one-to-one predictions, eliminating the need for non-maximum suppression (NMS) in the postprocessing stage. The necessary hyperparameters for the process are set as follows:

```yaml
postprocessor:
params:
num_top_queries: 300
score_thresh: 0.01
```
12 changes: 1 addition & 11 deletions docs/models/heads/anchordecoupledhead.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,6 @@ We have named the detection head of RetinaNet as AnchorDecoupledHead to represen
| `params.aspect_ratios` | (list[float]) List of aspect ratio for each anchor. |
| `params.num_layers` | (int) The number of convolution layers of regression and classification head. |
| `params.norm_layer` | (str) Normalization type for the head. |
| `params.topk_candidates` | (int) The number of boxes to retain based on score during the decoding step. |
| `params.score_thresh` | (float) Score thresholding value applied during the decoding step. |
| `params.nms_thresh` | (float) IoU threshold for non-maximum suppression. |
| `params.class_agnostic` | (bool) Whether to process class-agnostic non-maximum suppression. |

## Model configuration example

Expand All @@ -32,13 +28,7 @@ We have named the detection head of RetinaNet as AnchorDecoupledHead to represen
anchor_sizes: [[32,], [64,], [128,], [256,]]
aspect_ratios: [0.5, 1.0, 2.0]
num_layers: 1
norm_type: batch_norm
# postprocessor - decode
topk_candidates: 1000
score_thresh: 0.05
# postprocessor - nms
nms_thresh: 0.45
class_agnostic: False
norm_type: batch_norm
```
</details>

Expand Down
9 changes: 1 addition & 8 deletions docs/models/heads/anchorfreedecoupledhead.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ We provide the head of YOLOX as AnchorFreeDecoupledHead. There are no differnece
| `name` | (str) Name must be "yolox_head" to use `YOLOX` head. |
| `params.act_type` | (float) Activation function for the head. |
| `params.depthwise`| (bool) Whether to enable depthwise convolution for the head. |
| `params.score_thresh` | (float) Score thresholding value applied during the decoding step. |
| `params.class_agnostic` | (bool) Whether to process class-agnostic non-maximum suppression. |

## Model configuration example

Expand All @@ -26,12 +24,7 @@ We provide the head of YOLOX as AnchorFreeDecoupledHead. There are no differnece
name: anchor_free_decoupled_head
params:
depthwise: False
act_type: "silu"
# postprocessor - decode
score_thresh: 0.7
# postprocessor - nms
nms_thresh: 0.45
class_agnostic: False
act_type: "silu"
```
</details>

Expand Down
58 changes: 58 additions & 0 deletions docs/models/heads/rtdetrhead.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# RT-DETR Head
RT-DETR detection head based on [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069).

We provide the head of RT-DETR as `rtdetr_head`.

## Field list

| Field <img width=200/> | Description |
|---|---|
| `name` | (str) Name must be "rtdetr_head" to use `RT-DETR Head` head. |
| `params.hidden_dim` | (int) Hidden dimension size, default is 256 according to paper's Appendix Table A |
| `params.num_attention_heads` | (int) Number of attention heads, default is 8 according to paper's Appendix Table A |
| `params.num_levels` | (int) Number of feature levels used, default is 3 according to paper's Section 4.1 |
| `params.num_queries` | (int) Number of object queries, default is 300 according to paper's Section 4.1 and Appendix Table A |
| `params.eps` | (float) Small constant for numerical stability, default is 1e-2 |
| `params.num_decoder_layers` | (int) Number of decoder layers. |
| `params.position_embed_type` | (str) Type of position embedding used ['sine', 'learned']. |
| `params.num_decoder_points` | (int) Number of decoder reference points, default is 4 according to paper's Appendix Table A. |
| `params.dim_feedforward` | (int) Feedforward network dimension, default is 1024 according to paper's Appendix Table A. |
| `params.dropout` | (float) Dropout rate in layers. |
| `params.act_type` | (str) Activation function type. |
| `params.num_denoising` | (int) Number of denoising queries. |
| `params.label_noise_ratio` | (float) Label noise ratio for denoising training, default is 0.5 according to paper's Appendix Table A. |
| `params.use_aux_loss` | (bool) Whether to use auxiliary loss when training. The paper mentions using auxiliary prediction heads in Section 4.1. |

## Model configuration example

<details>
<summary>RT-DETR head</summary>

```yaml
model:
architecture:
head:
name: rtdetr_head
params:
hidden_dim: 256
num_attention_heads: 8
num_levels: 3
num_queries: 300
eps: 1e-2
num_decoder_layers: 3
eval_spatial_size: ~
position_embed_type: sine
num_decoder_points: 4
dim_feedforward: 1024
dropout: 0.0
act_type: relu
num_denoising: 100
label_noise_ratio: 0.5
use_aux_loss: true
```
</details>

## Related links

- [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069)
- [lyuwenyu/RT-DETR](https://github.com/lyuwenyu/RT-DETR)
29 changes: 0 additions & 29 deletions docs/models/necks/fpn.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,35 +4,6 @@ FPN based on [Feature Pyramid Networks for Object Detection](https://openaccess.

The Feature Pyramid Network (FPN) is designed to enhance feature maps given from the backbone, typically used for detection models. Therefore, we also recommend to use it in detection task as well. FPN can create more pyramid deeply than the input feature pyramid from backbone, and in such cases, additional convolution or pooling layers are added.

## Compatibility matrix

<table>
<tr>
<th>Supporting backbones</th>
<th>Supporting heads</th>
<th>torch.fx</th>
<th>NetsPresso</th>
</tr>
<tr>
<td>
ResNet<br />
MobileNetV3<br />
MixNet<br />
CSPDarkNet<br />
MobileViT<br />
MixTransformer<br />
EfficientFormer
</td>
<td>
ALLMLPDecoder<br />
AnchorDecoupledHead<br />
AnchorFreeDecoupledHead
</td>
<td>Supported</td>
<td>Supported</td>
</tr>
</table>

## Field list

| Field <img width=200/> | Description |
Expand Down
44 changes: 44 additions & 0 deletions docs/models/necks/rtdetrhybridencoder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# RT-DETR Hybrid Encoder

RT-DETR Hybrid Encoder based on [RT-DETR: DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069)



## Field lists
| Field <img width=200/> | Description |
|---|---|
| `name` | (str) Name must be "rtdetr_hybrid_encoder" to use RT-DETR Hybrid Encoder. |
| `params.hidden_dim` | (int) Hidden dimension size, default is 256 according to paper's Appendix Table A |
| `params.use_encoder_idx` | (list) Index indicating which feature level to apply encoder. Default is [2] since paper's Section 4.2 mentions AIFI only performed on S5 (highest level) |
| `params.num_encoder_layers` | (int) Number of encoder layers. |
| `params.pe_temperature` | (float) Temperature for positional encoding |
| `params.num_attention_heads` | (int) Number of attention heads. |
| `params.dim_feedforward` | (int) Dimension of feedforward network. |
| `params.dropout` | (float) Dropout rate, default is 0.0 according to configuration |
| `params.attn_act_type` | (str) Activation function type for attention, using GELU |
| `params.expansion` | (float) Expansion ratio for RepBlock in CCFF module, default is 0.5 |
| `params.depth_mult` | (float) Depth multiplier for scaling. |
| `params.conv_act_type` | (str) Activation function type for convolution layers, using SiLU according to paper's Figure 4. |


## Model configuration examples

<details>
<summary>RT-DETR Hybrid Encoder</summary>

```yaml
model:
architecture:
neck:
name: fpn
params:
num_outs: 4
start_level: 0
end_level: -1
add_extra_convs: False
relu_before_extra_convs: False
```
</details>

## Related links

29 changes: 0 additions & 29 deletions docs/models/necks/yolopafpn.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,35 +4,6 @@ YOLOPAFPN based on [YOLOX: Exceeding YOLO Series in 2021](https://arxiv.org/abs/

YOLOPAFPN is a modified PAFPN for YOLOX model. Therefore, although YOLOPAFP is compatible with various backbones, we recommend to use it when constructing YOLOX models. The size is determined by `dep_mul` value, which defines the repetition of CSPLayers.

## Compatibility matrix

<table>
<tr>
<th>Supporting backbones</th>
<th>Supporting heads</th>
<th>torch.fx</th>
<th>NetsPresso</th>
</tr>
<tr>
<td>
ResNet<br />
MobileNetV3<br />
MixNet<br />
CSPDarkNet<br />
MobileViT<br />
MixTransformer<br />
EfficientFormer
</td>
<td>
ALLMLPDecoder<br />
AnchorDecoupledHead<br />
AnchorFreeDecoupledHead
</td>
<td>Supported</td>
<td>Supported</td>
</tr>
</table>

## Field list

| Field <img width=200/> | Description |
Expand Down
Loading