[Doc] Add Doc of Detection Transformers #9534

Li-Qingyun · 2022-12-26T10:06:26Z

Motivation

We (me, @jshilong, @LYMDLUT, and @KeiChiTse) have refactored DETR-like models, to enhance the usability and readability of our codebase. This PR add a tutorial doc about the new codebase.

* [Fix] Fix UT to be compatible with pytorch 1.6 (open-mmlab#8707) * Update * Update * force reinstall pycocotools * Fix build_cuda * docker install git * Update * comment other job to speedup process * update * uncomment * Update * Update * Add comments for --force-reinstall * [Refactor] Refactor anchor head and base head with boxlist (open-mmlab#8625) * Refactor anchor head * Update * Update * Update * Add a series of boxes tools * Fix box type to support n x box_dim boxes * revert box type changes * Add docstring * refactor retina_head * Update * Update * Fix comments * modify docstring of coder and ioucalculator * Replace with_boxlist with use_box_type * fix: fix config of detr-r18 * fix: modified import of MSDeformAttn in PixelDecoder of Mask2Former * feat: add TransformerDetector as the base detector of DETR-like detectors * refactor: refactor modules and configs of DETR * refactor: refactor DETR-related modules in transformer.py * refactor: refactor DETR-related modules in transformer.py * fix: add type comments in detr.py * correct trainloop in detr_r50 config * fix: modify the parent class of DETRHead to BaseModule * refactor: refactor modules and configs of Deformable DETR * fix: modify the usage of num_query * fix: modify the usage of num_query in configs * refactor: replace input_proj of detr with ChannelMapper neck * refactor: delete multi_apply in DETRHead.forward() * Update detr_r18_8xb2-500e_coco.py using channel mapper for r18 * change the name of detection_transfomer.py to base_detr.py * refactor: modify construct binary masks section of forward_pretransformer * refactor: utilize abstractmethod * update ABCmeta to make sure reload class TransformerDetector * some annotation * some annotation * some annotation * refactor: delete _init_transformer in detectors * refactor: modify args of deformable detr * refactor: modify about super().__init__() * Update detr_head.py Remove the multi feat lvl in function 'predict_by_feat' * Update detr.py update init_weights * some annotation for head * to make sure the head args the same as detector * to make sure the head args the same as detector * some bug * fix: fix bugs of num_pred in DeformableDETRHead * add kwargs to transformer * support MLP and sineembed position * detele positional encodeing * delete useless postnorm * Revert "add kwargs to transformer" This reverts commit a265c1a. * Update detr_head.py Update type and shape of args * Update detr_head.py fix args docstring in predict_by_feat * Update base_detr.py Update docstring for forward_pretransformer * Update deformable_detr.py Fix docstring * to support conditional detr with reload forward_transformer * fix: update config files of Two-stage and Box-refine * replace all bs with batch_size in detr-related files * update deformable.py and transformer.py * update docstring in base_detr * update docstring in base_detr, detr * doc refine * Revert "doc refine" This reverts commit b69da4f. * doc refine * doc refine * updabase_detr, detr, and le layers/transformdoc * fix doc in base_detr * add origin repo link * add origin repo link * refine doc * refine doc * refine doc * refine doc * refine doc * refine doc * refine doc * refine doc * doc: add doc of the first edition of Deformable DETR * batch_size to bs * refine doc * refine doc * feat: add config comments of specific module * refactor: refactor base DETR class TransformerDetector * fix: fix wrong return typehint of forward_encoder in TransformerDetector * refactor: refactor DETR * refactor: refactor Deformable DETR * refactor: refactor forward_encoder and pre_decoder * fix: fix bugs of new edition * refactor: small modifications * fix: move get_reference_points to deformable_encoder * refactor: merge init_&inter_reference to references in Deformable DETR * modify docstring of get_valid_ratio in Deformable DETR * add some docstring * doc: add docstring of deformable_detr.py * doc: add docstring of deformable_detr_head.py * doc: modify docstring of deformable detr * doc: add docstring of deformable_detr_head.py * doc: modify docstring of deformable detr * doc: add docstring of base_detr.py * doc: refine docstring of base_detr.py * doc: refine docstring of base_detr.py * a little change of MLP * a little change of MLP * a little change of MLP * a little change of MLP * refine config * refine config * refine config * refine doc string for detr * little refine doc string for detr.py * tiny modification * doc: refine docstring of detr.py * tiny modifications to resolve the conversations * DETRHead.predict() draft * tiny modifications to resolve conversations * refactor: modify arg names and forward strategies of bbox_head * tiny modifications to resolve the conversations * support MLP * fix docsting of function pre_decoder * fix docsting of function pre_decoder * fix docstring * modifications for resolving conversations * refactor: eradicate key_padding_mask args * refactor: eradicate key_padding_mask args * fix: fix bug of deformable detr and resolve some conversations * refactor: rename base class with DetectionTransformer and other modifications * fix: fix config of detr * fix the bug of init * fix: fix init_weight of DETR and Deformable DETR * resolve conflict * fix auto-merge bug * fix pre-commit bug * refactor: move the position of encoder and decoder * delete Transformer in ci test * delete Transformer in ci test Co-authored-by: jbwang1997 <jbwang1997@gmail.com> Co-authored-by: KeiChiTse <xqz20@mails.tsinghua.edu.cn> Co-authored-by: LYMDLUT <70597027+LYMDLUT@users.noreply.github.com> Co-authored-by: lym <letusgo126@126.com> Co-authored-by: Kei-Chi Tse <109070650+KeiChiTse@users.noreply.github.com>

* fix data format from nbc to bnc in detr and deformable-detr * set 'batch_first' to True in deformable attention Co-authored-by: QingyunLi <962537281@qq.com>

* Add conditional detr to 3.0 Co-authored-by: lym <letusgo126@126.com>

* feat: add config of DINO_4s_R50_50e * feat: add detector file of DINO * feat: add head file of DINO * fix: align the inference results * fix: align the loss * refactor: move decoder to layers * small modifications to resolve recent conversations * refactor: rewrite device manner and other small modifications * refactor: refactor multiple mode of pre_decoder * doc: complementary the docstring of dino.py * refactor & doc: refactor DINOHead and add some docstring of the dino_head.py * fix & doc: fix bugs and add docs of dino_transformer.py * refactor: rewrite CdnQueryGenerator * refactor: refactor CdnQueryGenerator. * refactor: modify cdn * refactor: rewrite and add docstring to CdnQueryGenerator.generate_dn_bbox_query * doc: fix return format * doc: add docstring of CdnQueryGenerator * feat: add 91-cls configs of detr and deformable detr * refactor: rewrite generate_dn_mask * refactor: rewrite collate_dn_queries * fix & doc: fix bug of get_dn_target and add some docstring * doc: write docstring of DINOHead * feat: add configs of R50-24e and R50-36e * fix: align loss * fix: delete original __call__ of CdnQueryGenerator * fix: add init of query_embedding * fix: fix typo of dino config * fix: move convert_coordinate_to_encoding and modify num_query->num_queries num_key->num_keys * resolve the conversations * delete some TODO * add yapf diasble to layers/__init__.py * fix unit test * commit modification in pr of DINO * fix data format from nbc to bnc in detr and deformable-detr * fix 'gen_encoder_output_proposals' for two-stage deformable-detr * fix 'gen_encoder_output_proposals' for two-stage deformable-detr * set 'batch_first' to True in deformable attention * fix error * fix ut * add assert for batch_first * remove all 91cls-related codes * doc: add loss_by_feat comment * modify pre_decoder of DeformableDETR * delete useless comments * refactor: change to BNC data flow for DINO * ut: add unit test of DINO * add README.md of dino * fix wrong box_noise_scale doc * fix __init__.py of layer Co-authored-by: KeiChiTse <xqz20@mails.tsinghua.edu.cn>

* resolve refactor conflict w.o. pre-commit hooks * fixed error and finished alignment * supprot 91 cls and remove batch_first * delete iter_update, keep return intermediate * delete posHW * substitute 'num_query' to 'num_queries' * change 'gen_sineembed_for_position' to 'convert_coordinate_to_encoding' * resolve extra comments * fix error * fix error * fix data path * support 91 cls temporarily * resolve extra comments * fix num_keys, num_feats * delete reg_branches in decoder_inputs_dict * fix docstring * fix docstring * commit modification in pr of DINO * fix data format from nbc to bnc in detr and deformable-detr * fix 'gen_encoder_output_proposals' for two-stage deformable-detr * fix 'gen_encoder_output_proposals' for two-stage deformable-detr * set 'batch_first' to True in deformable attention * fix error * fix ut * add assert for batch_first * remove 91 cls * modify pre_decoder of DeformableDETR * delete useless comments * bnc data flow w.o. merge detr and def-detr * assert batch first flag in conditional attention, fix error * add unit test for dab-detr * fix doc * disable yapf hook * move conditional attention to trm/layers * fix name and add doc * fix doc * add loss_and_predict for head * fix doc and typehint * fix doc and typehint * modify batch first assert for attention * change Dab to DAB * rename file and function * make dab-detr head inherit conditional detr head * fix doc * fix doc Co-authored-by: QingyunLi <962537281@qq.com>

ZwwWayne

Please refer to the compatiblility doc to update this doc. we should list more details like batch first, rename issues,etc.

Li-Qingyun · 2022-12-27T09:44:46Z

Please refer to the compatiblility doc to update this doc. we should list more details like batch first, rename issues,etc.

Okkkk, We are making an outline.

Li-Qingyun · 2023-01-03T05:55:32Z

For convenience, I temporarily put the Chinese document together with the English document and images, and will adjust the file location after most of the writing is done. （I wrote Chinese first, then translated it to English. Hence, CN doc has been added.）

docs/en/advanced_guides/detection_transformer.md

… in both doc

1. Introduce two kinds of data flow; 2. Why we unify the data flow; 3. Examples (to be cotinued)...

ZwwWayne · 2023-01-16T07:47:16Z

Invite @ZCMax to review

docs/en/advanced_guides/detection_transformer.md

RangeKing · 2023-01-16T11:10:40Z

docs/en/advanced_guides/detection_transformer.md

+
+For the forward process, except for the earliest `extracting features` and the latest `calculating with head`, the intermediate forward process of transformer are designed as four steps: `pre_transformer`, `forward_encoder`, `pre_decoder`, and `forward_decoder`. The parameters flow among the functions are summarized as follow:
+
+![DETR_forward_process](.\DETR_forward_process.png)


use a githubusercontent link instead of a png file?

use a githubusercontent link instead of a png file?

I'm just using local images temporarily.
The image may require adjustment.
The link will be added in the final commit.

RangeKing · 2023-01-16T11:10:48Z

docs/en/advanced_guides/detection_transformer.md

+
+In most DETRs, the features extracted from backbone and neck were fed into a Transformer which is composed of an encoder and a decoder. The Transformer directly outputs a set of queries in parallel. Each query corresponds to one prediction, which may be an object or `no object`.
+
+![DETR_overall](.\DETR_overall.png)


use a githubusercontent link instead of a png file?

use a githubusercontent link instead of a png file?

OK~ Thanks very much for your review~

RangeKing · 2023-01-16T11:37:44Z

docs/en/advanced_guides/detection_transformer.md

+
+To support more operations on multi-scale features, some extra information should be introduced. For example, the feature `spatial shape` on each level, the `lvl_start_index` (the start sequence indexes of each feature level), and so on. The `spatial shape` and `lvl_start_index` can be used to restore the sequence feature of `(B, N, C)` to the tuple of multi-scale features of `B, C, H_l, W_l`. They can also support special multi-scale feature interaction operations, such as Deformable Attention.
+
+<img src="C:\Users\lqy\Desktop\doc_detr\DETR_mlvl_feats2seq.png" style="zoom:50%;" />


wrong image link

wrong image link

OK~ Thanks very much for your review~

RangeKing · 2023-01-16T11:37:55Z

docs/en/advanced_guides/detection_transformer.md

+
+There are positional embeddings for the inputs of attention modules in DETRs. Unlike most cases, DETRs only embed for queries and keys, and not embed for values. Moreover, DETRs embed positions of both spatial directions, i.e. row and column, namely 2D position encoding.
+
+![](C:\Users\lqy\Desktop\doc_detr\DETR_positional_encoding.png)


wrong image link

wrong image link

OK~ Thanks very much for your review~

docs/en/advanced_guides/detection_transformer.md

Co-authored-by: Range King <RangeKingHZ@gmail.com>

jshilong · 2023-01-17T09:34:02Z

@Li-Qingyun This PR can be merged after adding the compatibility part

Li-Qingyun · 2023-01-29T07:41:04Z

I'll try to rebase the branch and then resolve the conflicts.

Li-Qingyun and others added 7 commits October 20, 2022 15:20

Add unitests for detr 3.x (open-mmlab#9089)

b257a29

[Refactor] Change to BNC data flow for all DETRs (open-mmlab#9460)

f874d5c

* fix data format from nbc to bnc in detr and deformable-detr * set 'batch_first' to True in deformable attention Co-authored-by: QingyunLi <962537281@qq.com>

Refactor detr 3.x conditional detr (open-mmlab#9405)

899d4b8

* Add conditional detr to 3.0 Co-authored-by: lym <letusgo126@126.com>

doc: add detection_transformer.md

80ce8d2

mm-assistant bot assigned Czm369 Dec 26, 2022

modified by the pre-commit hook

fef2947

ZwwWayne reviewed Dec 27, 2022

View reviewed changes

Li-Qingyun added 4 commits December 28, 2022 00:45

modify the first paragraph and section 1

ca14980

modify the first section and add an outline

0a18641

add temp zh doc and temp outline

7e0e960

add feature descriptions to CN doc

47705c3

ZwwWayne assigned jshilong and unassigned Czm369 Jan 3, 2023

ZwwWayne requested a review from jshilong January 3, 2023 03:29

ZwwWayne added this to the 3.0.0rc6 milestone Jan 3, 2023

complement feature descriptions to CN doc

ef95dd5

Li-Qingyun added 2 commits January 3, 2023 23:19

complement positional encoding and set prediction to CN doc

6ce2be0

add "implement a DETR" to CN doc

d192ad3

jshilong reviewed Jan 11, 2023

View reviewed changes

docs/en/advanced_guides/detection_transformer.md Show resolved Hide resolved

jshilong reviewed Jan 11, 2023

View reviewed changes

docs/en/advanced_guides/detection_transformer.md Outdated Show resolved Hide resolved

jshilong reviewed Jan 11, 2023

View reviewed changes

docs/en/advanced_guides/detection_transformer.md Outdated Show resolved Hide resolved

jshilong reviewed Jan 11, 2023

View reviewed changes

docs/en/advanced_guides/detection_transformer.md Show resolved Hide resolved

Li-Qingyun added 3 commits January 12, 2023 15:44

update Positional embedding of DETRs in EN doc

07b1456

update Object detection paradigm of set prediction in EN doc

2c4bca6

update Appointment-Parameter names in EN doc

8d5b9ec

Li-Qingyun added 2 commits January 12, 2023 19:23

update Customize a DETR in EN doc

d79db48

update Customize a DETR in CN doc

0302723

jshilong reviewed Jan 13, 2023

View reviewed changes

docs/en/advanced_guides/detection_transformer.md Outdated Show resolved Hide resolved

Li-Qingyun and others added 3 commits January 13, 2023 12:48

delete unnecessary basic knowledge in Positional embedding of DETRs…

81a8239

… in both doc

add fig of DETR_mlvl_feats2seq.png in both doc

e1f84e2

Add "unified data flow" doc

58a630b

1. Introduce two kinds of data flow; 2. Why we unify the data flow; 3. Examples (to be cotinued)...

ZCMax reviewed Jan 16, 2023

View reviewed changes

docs/en/advanced_guides/detection_transformer.md Show resolved Hide resolved

ZCMax reviewed Jan 16, 2023

View reviewed changes

docs/en/advanced_guides/detection_transformer.md Show resolved Hide resolved

ZCMax reviewed Jan 16, 2023

View reviewed changes

docs/en/advanced_guides/detection_transformer.md Show resolved Hide resolved

ZCMax reviewed Jan 16, 2023

View reviewed changes

docs/en/advanced_guides/detection_transformer.md Show resolved Hide resolved

ZCMax reviewed Jan 16, 2023

View reviewed changes

docs/en/advanced_guides/detection_transformer.md Show resolved Hide resolved

RangeKing reviewed Jan 16, 2023

View reviewed changes

Update docs/en/advanced_guides/detection_transformer.md as RangeKing

ba45638

Co-authored-by: Range King <RangeKingHZ@gmail.com>

Li-Qingyun added 2 commits January 18, 2023 00:11

add Compatibility to EN doc

2c0587a

update Compatibility of EN doc

77cdbd9

jshilong mentioned this pull request Jan 19, 2023

Refactor DETRs and add C-DETR, DAB-DETR and DINO #9646

Merged

jshilong force-pushed the refactor-detr branch from 07d7ae4 to 4c11117 Compare January 19, 2023 05:21

Li-Qingyun changed the base branch from refactor-detr to dev-3.x January 29, 2023 07:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] Add Doc of Detection Transformers #9534

[Doc] Add Doc of Detection Transformers #9534

Li-Qingyun commented Dec 26, 2022

ZwwWayne left a comment

Li-Qingyun commented Dec 27, 2022

Li-Qingyun commented Jan 3, 2023

ZwwWayne commented Jan 16, 2023

RangeKing Jan 16, 2023

Li-Qingyun Jan 16, 2023

RangeKing Jan 16, 2023

Li-Qingyun Jan 16, 2023

RangeKing Jan 16, 2023

Li-Qingyun Jan 16, 2023

RangeKing Jan 16, 2023

Li-Qingyun Jan 16, 2023

jshilong commented Jan 17, 2023

Li-Qingyun commented Jan 29, 2023


		For the forward process, except for the earliest `extracting features` and the latest `calculating with head`, the intermediate forward process of transformer are designed as four steps: `pre_transformer`, `forward_encoder`, `pre_decoder`, and `forward_decoder`. The parameters flow among the functions are summarized as follow:

		![DETR_forward_process](.\DETR_forward_process.png)


		In most DETRs, the features extracted from backbone and neck were fed into a Transformer which is composed of an encoder and a decoder. The Transformer directly outputs a set of queries in parallel. Each query corresponds to one prediction, which may be an object or `no object`.

		![DETR_overall](.\DETR_overall.png)


		To support more operations on multi-scale features, some extra information should be introduced. For example, the feature `spatial shape` on each level, the `lvl_start_index` (the start sequence indexes of each feature level), and so on. The `spatial shape` and `lvl_start_index` can be used to restore the sequence feature of `(B, N, C)` to the tuple of multi-scale features of `B, C, H_l, W_l`. They can also support special multi-scale feature interaction operations, such as Deformable Attention.

		<img src="C:\Users\lqy\Desktop\doc_detr\DETR_mlvl_feats2seq.png" style="zoom:50%;" />


		There are positional embeddings for the inputs of attention modules in DETRs. Unlike most cases, DETRs only embed for queries and keys, and not embed for values. Moreover, DETRs embed positions of both spatial directions, i.e. row and column, namely 2D position encoding.

		![](C:\Users\lqy\Desktop\doc_detr\DETR_positional_encoding.png)

[Doc] Add Doc of Detection Transformers #9534

Are you sure you want to change the base?

[Doc] Add Doc of Detection Transformers #9534

Conversation

Li-Qingyun commented Dec 26, 2022

Motivation

ZwwWayne left a comment

Choose a reason for hiding this comment

Li-Qingyun commented Dec 27, 2022

Li-Qingyun commented Jan 3, 2023

ZwwWayne commented Jan 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jshilong commented Jan 17, 2023

Li-Qingyun commented Jan 29, 2023