Skip to content

Commit

Permalink
Refine annotations for DAB-DETR Transformer (facebookresearch#61)
Browse files Browse the repository at this point in the history
* refine annos

* fix

* refine dn annos

* refine README

* refine CondDETR README

* refine README

* add detr image

* refine

* refine

* refine links

* refine links

Co-authored-by: ntianhe ren <rentianhe@dgx061.scc.idea>
  • Loading branch information
rentainhe and ntianhe ren authored Sep 15, 2022
1 parent fd8825f commit f986a0f
Show file tree
Hide file tree
Showing 6 changed files with 94 additions and 42 deletions.
31 changes: 19 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
[📘Documentation]() |
[🛠️Installation]() |
[👀Model Zoo]() |
[🚀Awesome DETR](https://github.com/IDEACVR/awesome-detection-transformer) |
[🚀Awesome DETR](https://github.com/IDEA-Research/awesome-detection-transformer) |
[🆕News]() |
[🤔Reporting Issues](https://github.com/rentainhe/detrex/issues/new/choose)

Expand All @@ -29,6 +29,9 @@

detrex is an open-source toolbox that provides state-of-the-art transformer based detection algorithms on top of [Detectron2](https://github.com/facebookresearch/detectron2) and the module designs are partially borrowed from [MMDetection](https://github.com/open-mmlab/mmdetection) and [DETR](https://github.com/facebookresearch/detr). Many thanks for their nicely organized code. The main branch works with **Pytorch 1.9+** or higher (we recommend **Pytorch 1.12**).

<div align="center">
<img src="./assets/detr_arch.png" width="100%"/>
</div>

<details open>
<summary> Major Features </summary>
Expand All @@ -41,7 +44,7 @@ detrex is an open-source toolbox that provides state-of-the-art transformer base
- [LazyConfig System](https://detectron2.readthedocs.io/en/latest/tutorials/lazyconfigs.html) for more flexible syntax and cleaner config files.
- Light-weight [training engine](./tools/train_net.py) modified from detectron2 [lazyconfig_train_net.py](https://github.com/facebookresearch/detectron2/blob/main/tools/lazyconfig_train_net.py)

Apart from detrex, we also released a repo [Awesome Detection Transformer](https://github.com/IDEACVR/awesome-detection-transformer) to present papers about transformer for detection and segmentation.
Apart from detrex, we also released a repo [Awesome Detection Transformer](https://github.com/IDEA-Research/awesome-detection-transformer) to present papers about transformer for detection and segmentation.

</details>

Expand All @@ -59,25 +62,28 @@ Please refer to [Getting Started with detrex]() for the basic usage of detrex.
Please see [documentation]() for full API documentation and tutorials.

## Model Zoo
Results and models are available in [model zoo]()
Results and models are available in [model zoo]().

<details open>
<summary> Supported methods </summary>

- [x] [DETR](./projects/detr/)
- [x] [Deformable-DETR](./projects/dab_deformable_detr/)
- [x] [Conditional DETR]()
- [x] [DAB-DETR](./projects/dab_detr/)
- [x] [DAB-Deformable-DETR](./projects/dab_deformable_detr/)
- [x] [DN-DETR](./projects/dn_detr/)
- [x] [DN-Deformable-DETR](./projects/dn_deformable_detr/)
- [x] [DINO](./projects/dino/)
- [x] [DETR (ECCV'2020)](./projects/detr/)
- [x] [Deformable-DETR (ICLR'2021)](./projects/dab_deformable_detr/)
- [x] [Conditional DETR (ICCV'2021)](./projects/conditional_detr/)
- [x] [DAB-DETR (ICLR'2022)](./projects/dab_detr/)
- [x] [DAB-Deformable-DETR (ICLR'2022)](./projects/dab_deformable_detr/)
- [x] [DN-DETR (CVPR'2022)](./projects/dn_detr/)
- [x] [DN-Deformable-DETR (CVPR'2022)](./projects/dn_deformable_detr/)
- [x] [DINO (ArXiv'2022)](./projects/dino/)

Please see [projects](./projects/) for the details about projects that are built based on detrex.

</details>


## Change Log

The beta v0.1.0 version was released in 30/09/2022. Highlights of the released version:
The **beta v0.1.0** version was released in 30/09/2022. Highlights of the released version:
- Support various backbones including: [FocalNet](https://arxiv.org/abs/2203.11926), [Swin-T](https://arxiv.org/pdf/2103.14030.pdf), [ResNet](https://arxiv.org/abs/1512.03385) and other [detectron2 builtin backbones](https://github.com/facebookresearch/detectron2/tree/main/detectron2/modeling/backbone).
- Add [timm](https://github.com/rwightman/pytorch-image-models) backbones wrapper and [torchvision](https://github.com/pytorch/vision) backbones wrapper.
- Support various transformer based detection algorithms including: [DETR](https://arxiv.org/abs/2005.12872), [Deformable-DETR](https://arxiv.org/abs/2010.04159), [Conditional-DETR](https://arxiv.org/abs/2108.06152), [DAB-DETR](https://arxiv.org/abs/2201.12329), [DN-DETR](https://arxiv.org/abs/2203.01305), [DINO](https://arxiv.org/abs/2203.03605).
Expand All @@ -96,3 +102,4 @@ This project is released under the [Apache 2.0 license](LICENSE).


## Citation
If you find this project useful in your research, please consider cite:
Binary file added assets/detr_arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions projects/conditional_detr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@ Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: dab_detr_r50_50ep -->
<tr><td align="left"><a href="configs/dab_detr_r50_50ep.py">Conditional DETR-R50</a></td>
<tr><td align="left"><a href="configs/conditional_detr_r50_50ep.py">Conditional DETR-R50</a></td>
<td align="center">R-50</td>
<td align="center">IN1k</td>
<td align="center">43.2</td>
<td align="center"> <a href="">Google Drive</a></td>
<td align="center">41.0</td>
<td align="center"> <a href="">model</a></td>
</tr>
</tbody></table>

**Note:** DC5 means using dilated convolution in `res5`.
**Note:** Here we borrowed the pretrained weight from [ConditionalDETR](https://github.com/Atten4Vis/ConditionalDETR). And our detrex training results will be released in the future version.


## Training
Expand Down
2 changes: 1 addition & 1 deletion projects/dab_detr/modeling/dab_detr.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# coding=utf-8
# Copyright 2022 The IDEA Authors. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
Expand Down
37 changes: 19 additions & 18 deletions projects/dab_detr/modeling/dab_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,16 +177,16 @@ def forward(
attn_masks=None,
query_key_padding_mask=None,
key_padding_mask=None,
refpoints_embed=None,
anchor_box_embed=None,
**kwargs,
):
intermediate = []

reference_points = refpoints_embed.sigmoid()
refpoints = [reference_points]
reference_boxes = anchor_box_embed.sigmoid()
intermediate_ref_boxes = [reference_boxes]

for idx, layer in enumerate(self.layers):
obj_center = reference_points[..., : self.embed_dim]
obj_center = reference_boxes[..., : self.embed_dim]
query_sine_embed = get_sine_pos_embed(obj_center)
query_pos = self.ref_point_head(query_sine_embed)

Expand Down Expand Up @@ -222,15 +222,16 @@ def forward(
**kwargs,
)

# iter update
# update anchor boxes after each decoder layer using shared box head.
if self.bbox_embed is not None:
temp = self.bbox_embed(query)
temp[..., : self.embed_dim] += inverse_sigmoid(reference_points)
new_reference_points = temp[..., : self.embed_dim].sigmoid()
# predict offsets and added to the input normalized anchor boxes.
offsets = self.bbox_embed(query)
offsets[..., : self.embed_dim] += inverse_sigmoid(reference_boxes)
new_reference_boxes = offsets[..., : self.embed_dim].sigmoid()

if idx != self.num_layers - 1:
refpoints.append(new_reference_points)
reference_points = new_reference_points.detach()
intermediate_ref_boxes.append(new_reference_boxes)
reference_boxes = new_reference_boxes.detach()

if self.return_intermediate:
if self.post_norm_layer is not None:
Expand All @@ -248,12 +249,12 @@ def forward(
if self.bbox_embed is not None:
return [
torch.stack(intermediate).transpose(1, 2),
torch.stack(refpoints).transpose(1, 2),
torch.stack(intermediate_ref_boxes).transpose(1, 2),
]
else:
return [
torch.stack(intermediate).transpose(1, 2),
reference_points.unsqueeze(0).transpose(1, 2),
reference_boxes.unsqueeze(0).transpose(1, 2),
]

return query.unsqueeze(0)
Expand All @@ -273,11 +274,11 @@ def init_weights(self):
if p.dim() > 1:
nn.init.xavier_uniform_(p)

def forward(self, x, mask, refpoints_embed, pos_embed):
def forward(self, x, mask, anchor_box_embed, pos_embed):
bs, c, h, w = x.shape
x = x.view(bs, c, -1).permute(2, 0, 1)
x = x.view(bs, c, -1).permute(2, 0, 1) # (c, bs, num_queries)
pos_embed = pos_embed.view(bs, c, -1).permute(2, 0, 1)
refpoints_embed = refpoints_embed.unsqueeze(1).repeat(1, bs, 1)
anchor_box_embed = anchor_box_embed.unsqueeze(1).repeat(1, bs, 1)
mask = mask.view(bs, -1)
memory = self.encoder(
query=x,
Expand All @@ -286,15 +287,15 @@ def forward(self, x, mask, refpoints_embed, pos_embed):
query_pos=pos_embed,
query_key_padding_mask=mask,
)
num_queries = refpoints_embed.shape[0]
target = torch.zeros(num_queries, bs, self.embed_dim, device=refpoints_embed.device)
num_queries = anchor_box_embed.shape[0]
target = torch.zeros(num_queries, bs, self.embed_dim, device=anchor_box_embed.device)

hidden_state, reference_boxes = self.decoder(
query=target,
key=memory,
value=memory,
key_pos=pos_embed,
refpoints_embed=refpoints_embed,
anchor_box_embed=anchor_box_embed,
)

return hidden_state, reference_boxes
58 changes: 51 additions & 7 deletions projects/dn_detr/modeling/dn_detr.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ------------------------------------------------------------------------------------------------
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
# ------------------------------------------------------------------------------------------------
# Modified from:
# https://github.com/facebookresearch/detr/blob/main/d2/detr/detr.py
# ------------------------------------------------------------------------------------------------

import math
from typing import List
Expand All @@ -33,6 +27,35 @@


class DNDETR(nn.Module):
"""Implement DAB-DETR in `DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
<https://arxiv.org/abs/2201.12329>`_
Args:
backbone (nn.Module): Backbone module for feature extraction.
in_features (List[str]): Selected backbone output features for transformer module.
in_channels (int): Dimension of the last feature in `in_features`.
position_embedding (nn.Module): Position encoding layer for generating position embeddings.
transformer (nn.Module): Transformer module used for further processing features and input queries.
embed_dim (int): Hidden dimension for transformer module.
num_classes (int): Number of total categories.
num_queries (int): Number of proposal dynamic anchor boxes in Transformer
criterion (nn.Module): Criterion for calculating the total losses.
aux_loss (bool): Whether to calculate auxiliary loss in criterion. Default: True.
pixel_mean (List[float]): Pixel mean value for image normalization.
Default: [123.675, 116.280, 103.530].
pixel_std (List[float]): Pixel std value for image normalization.
Default: [58.395, 57.120, 57.375].
freeze_anchor_box_centers (bool): If True, freeze the center param ``(x, y)`` for the initialized dynamic anchor boxes
in format ``(x, y, w, h)`` and only train ``(w, h)``. Default: True.
select_box_nums_for_evaluation (int): Select the top-k confidence predicted boxes for inference.
Default: 300.
denoising_groups (int): Number of groups for noised ground truths. Default: 5.
label_noise_prob (float): The probability of the label being noised. Default: 0.2.
box_noise_scale (float): Scaling factor for box noising. Default: 0.4.
with_indicator (bool): If True, add indicator in denoising queries part and matching queries part.
Default: True.
device (str): Training device. Default: "cuda".
"""
def __init__(
self,
backbone: nn.Module,
Expand Down Expand Up @@ -134,7 +157,28 @@ def init_weights(self):
nn.init.constant_(self.bbox_embed.layers[-1].bias.data, 0)

def forward(self, batched_inputs):
"""Forward function of `DAB-DETR` which excepts a list of dict as inputs.
Args:
batched_inputs (List[dict]): A list of instance dict, and each instance dict must consists of:
- dict["image"] (torch.Tensor): The unnormalized image tensor.
- dict["height"] (int): The original image height.
- dict["width"] (int): The original image width.
- dict["instance"] (detectron2.structures.Instances): Image meta informations and ground truth boxes and labels during training.
Please refer to https://detectron2.readthedocs.io/en/latest/modules/structures.html#detectron2.structures.Instances
for the basic usage of Instances.
Returns:
dict: Returns a dict with the following elements:
- dict["pred_logits"]: the classification logits for all queries (anchor boxes in DAB-DETR).
with shape ``[batch_size, num_queries, num_classes]``
- dict["pred_boxes"]: The normalized boxes coordinates for all queries in format
``(x, y, w, h)``. These values are normalized in [0, 1] relative to the size of
each individual image (disregarding possible padding). See PostProcess for information
on how to retrieve the unnormalized bounding box.
- dict["aux_outputs"]: Optional, only returned when auxilary losses are activated. It is a list of
dictionnaries containing the two above keys for each decoder layer.
"""
images = self.preprocess_image(batched_inputs)

if self.training:
Expand All @@ -147,7 +191,7 @@ def forward(self, batched_inputs):
batch_size, _, H, W = images.tensor.shape
img_masks = images.tensor.new_zeros(batch_size, H, W)

# only use last level feature in DAB-DETR
# only use last level feature as DAB-DETR
features = self.backbone(images.tensor)[self.in_features[-1]]
features = self.input_proj(features)
img_masks = F.interpolate(img_masks[None], size=features.shape[-2:]).to(torch.bool)[0]
Expand Down

0 comments on commit f986a0f

Please sign in to comment.