Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Add Doc of Detection Transformers #9534

Open
wants to merge 26 commits into
base: dev-3.x
Choose a base branch
from

Conversation

Li-Qingyun
Copy link
Contributor

Motivation

We (me, @jshilong, @LYMDLUT, and @KeiChiTse) have refactored DETR-like models, to enhance the usability and readability of our codebase. This PR add a tutorial doc about the new codebase.

Li-Qingyun and others added 7 commits October 20, 2022 15:20
* [Fix] Fix UT to be compatible with pytorch 1.6 (open-mmlab#8707)

* Update

* Update

* force reinstall pycocotools

* Fix build_cuda

* docker install git

* Update

* comment other job to speedup process

* update

* uncomment

* Update

* Update

* Add comments for --force-reinstall

* [Refactor] Refactor anchor head and base head with boxlist (open-mmlab#8625)

* Refactor anchor head

* Update

* Update

* Update

* Add a series of boxes tools

* Fix box type to support n x box_dim boxes

* revert box type changes

* Add docstring

* refactor retina_head

* Update

* Update

* Fix comments

* modify docstring of coder and ioucalculator

* Replace with_boxlist with use_box_type

* fix: fix config of detr-r18

* fix: modified import of MSDeformAttn in PixelDecoder of Mask2Former

* feat: add TransformerDetector as the base detector of DETR-like detectors

* refactor: refactor modules and configs of DETR

* refactor: refactor DETR-related modules in transformer.py

* refactor: refactor DETR-related modules in transformer.py

* fix: add type comments in detr.py

* correct trainloop in detr_r50 config

* fix: modify the parent class of DETRHead to BaseModule

* refactor: refactor modules and configs of Deformable DETR

* fix: modify the usage of num_query

* fix: modify the usage of num_query in configs

* refactor: replace input_proj of detr with ChannelMapper neck

* refactor: delete multi_apply in DETRHead.forward()

* Update detr_r18_8xb2-500e_coco.py

using channel mapper for r18

* change the name of detection_transfomer.py to base_detr.py

* refactor: modify construct binary masks section of forward_pretransformer

* refactor: utilize abstractmethod

* update ABCmeta to make sure reload class TransformerDetector

* some annotation

* some annotation

* some annotation

* refactor: delete _init_transformer in detectors

* refactor: modify args of deformable detr

* refactor: modify about super().__init__()

* Update detr_head.py

Remove the multi feat lvl in function 'predict_by_feat'

* Update detr.py

update init_weights

* some annotation for head

* to make sure the head args the same as detector

* to make sure the head args the same as detector

* some bug

* fix: fix bugs of num_pred in DeformableDETRHead

* add kwargs to transformer

* support MLP and sineembed position

* detele positional encodeing

* delete useless postnorm

* Revert "add kwargs to transformer"

This reverts commit a265c1a.

* Update detr_head.py

Update type and shape of args

* Update detr_head.py

fix args docstring in predict_by_feat

* Update base_detr.py

Update docstring for forward_pretransformer

* Update deformable_detr.py

Fix docstring

* to support conditional detr with reload forward_transformer

* fix: update config files of Two-stage and Box-refine

* replace all bs with batch_size in detr-related files

* update deformable.py and transformer.py

* update docstring in base_detr

* update docstring in base_detr, detr

* doc refine

* Revert "doc refine"

This reverts commit b69da4f.

* doc refine

* doc refine

* updabase_detr, detr, and le layers/transformdoc

* fix doc in base_detr

* add origin repo link

* add origin repo link

* refine doc

* refine doc

* refine doc

* refine doc

* refine doc

* refine doc

* refine doc

* refine doc

* doc: add doc of the first edition of Deformable DETR

* batch_size to bs

* refine doc

* refine doc

* feat: add config comments of specific module

* refactor: refactor base DETR class TransformerDetector

* fix: fix wrong return typehint of forward_encoder in TransformerDetector

* refactor: refactor DETR

* refactor: refactor Deformable DETR

* refactor: refactor forward_encoder and pre_decoder

* fix: fix bugs of new edition

* refactor: small modifications

* fix: move get_reference_points to deformable_encoder

* refactor: merge init_&inter_reference to references in Deformable DETR

* modify docstring of get_valid_ratio in Deformable DETR

* add some docstring

* doc: add docstring of deformable_detr.py

* doc: add docstring of deformable_detr_head.py

* doc: modify docstring of deformable detr

* doc: add docstring of deformable_detr_head.py

* doc: modify docstring of deformable detr

* doc: add docstring of base_detr.py

* doc: refine docstring of base_detr.py

* doc: refine docstring of base_detr.py

* a little change of MLP

* a little change of MLP

* a little change of MLP

* a little change of MLP

* refine config

* refine config

* refine config

* refine doc string for detr

* little refine doc string for detr.py

* tiny modification

* doc: refine docstring of detr.py

* tiny modifications to resolve the conversations

* DETRHead.predict() draft

* tiny modifications to resolve conversations

* refactor: modify arg names and forward strategies of bbox_head

* tiny modifications to resolve the conversations

* support MLP

* fix docsting of function pre_decoder

* fix docsting of function pre_decoder

* fix docstring

* modifications for resolving conversations

* refactor: eradicate key_padding_mask args

* refactor: eradicate key_padding_mask args

* fix: fix bug of deformable detr and resolve some conversations

* refactor: rename base class with DetectionTransformer and other modifications

* fix: fix config of detr

* fix the bug of init

* fix: fix init_weight of DETR and Deformable DETR

* resolve conflict

* fix auto-merge bug

* fix pre-commit bug

* refactor: move the position of encoder and decoder

* delete Transformer in ci test

* delete Transformer in ci test

Co-authored-by: jbwang1997 <jbwang1997@gmail.com>
Co-authored-by: KeiChiTse <xqz20@mails.tsinghua.edu.cn>
Co-authored-by: LYMDLUT <70597027+LYMDLUT@users.noreply.github.com>
Co-authored-by: lym <letusgo126@126.com>
Co-authored-by: Kei-Chi Tse <109070650+KeiChiTse@users.noreply.github.com>
* fix data format from nbc to bnc in detr and deformable-detr

* set 'batch_first' to True in deformable attention
Co-authored-by: QingyunLi <962537281@qq.com>
* Add conditional detr to 3.0
Co-authored-by: lym <letusgo126@126.com>
* feat: add config of DINO_4s_R50_50e

* feat: add detector file of DINO

* feat: add head file of DINO

* fix: align the inference results

* fix: align the loss

* refactor: move decoder to layers

* small modifications to resolve recent conversations

* refactor: rewrite device manner and other small modifications

* refactor: refactor multiple mode of pre_decoder

* doc: complementary the docstring of dino.py

* refactor & doc: refactor DINOHead and add some docstring of the dino_head.py

* fix & doc: fix bugs and add docs of dino_transformer.py

* refactor: rewrite CdnQueryGenerator

* refactor: refactor CdnQueryGenerator.

* refactor: modify cdn

* refactor: rewrite and add docstring to CdnQueryGenerator.generate_dn_bbox_query

* doc: fix return format

* doc: add docstring of CdnQueryGenerator

* feat: add 91-cls configs of detr and deformable detr

* refactor: rewrite generate_dn_mask

* refactor: rewrite collate_dn_queries

* fix & doc: fix bug of get_dn_target and add some docstring

* doc: write docstring of DINOHead

* feat: add configs of R50-24e and R50-36e

* fix: align loss

* fix: delete original __call__ of CdnQueryGenerator

* fix: add init of query_embedding

* fix: fix typo of dino config

* fix: move convert_coordinate_to_encoding and modify num_query->num_queries num_key->num_keys

* resolve the conversations

* delete some TODO

* add yapf diasble to layers/__init__.py

* fix unit test

* commit modification in pr of DINO

* fix data format from nbc to bnc in detr and deformable-detr

* fix 'gen_encoder_output_proposals' for two-stage deformable-detr

* fix 'gen_encoder_output_proposals' for two-stage deformable-detr

* set 'batch_first' to True in deformable attention

* fix error

* fix ut

* add assert for batch_first

* remove all 91cls-related codes

* doc: add loss_by_feat comment

* modify pre_decoder of DeformableDETR

* delete useless comments

* refactor: change to BNC data flow for DINO

* ut: add unit test of DINO

* add README.md of dino

* fix wrong box_noise_scale doc

* fix __init__.py of layer

Co-authored-by: KeiChiTse <xqz20@mails.tsinghua.edu.cn>
* resolve refactor conflict w.o. pre-commit hooks

* fixed error and finished alignment

* supprot 91 cls and remove batch_first

* delete iter_update, keep return intermediate

* delete posHW

* substitute 'num_query' to 'num_queries'

* change 'gen_sineembed_for_position' to 'convert_coordinate_to_encoding'

* resolve extra comments

* fix error

* fix error

* fix data path

* support 91 cls temporarily

* resolve extra comments

* fix num_keys, num_feats

* delete reg_branches in decoder_inputs_dict

* fix docstring

* fix docstring

* commit modification in pr of DINO

* fix data format from nbc to bnc in detr and deformable-detr

* fix 'gen_encoder_output_proposals' for two-stage deformable-detr

* fix 'gen_encoder_output_proposals' for two-stage deformable-detr

* set 'batch_first' to True in deformable attention

* fix error

* fix ut

* add assert for batch_first

* remove 91 cls

* modify pre_decoder of DeformableDETR

* delete useless comments

* bnc data flow w.o. merge detr and def-detr

* assert batch first flag in conditional attention, fix error

* add unit test for dab-detr

* fix doc

* disable yapf hook

* move conditional attention to trm/layers

* fix name and add doc

* fix doc

* add loss_and_predict for head

* fix doc and typehint

* fix doc and typehint

* modify batch first assert for attention

* change Dab to DAB

* rename file and function

* make dab-detr head inherit conditional detr head

* fix doc

* fix doc

Co-authored-by: QingyunLi <962537281@qq.com>
Copy link
Collaborator

@ZwwWayne ZwwWayne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to the compatiblility doc to update this doc. we should list more details like batch first, rename issues,etc.

@Li-Qingyun
Copy link
Contributor Author

Please refer to the compatiblility doc to update this doc. we should list more details like batch first, rename issues,etc.

Okkkk, We are making an outline.

@ZwwWayne ZwwWayne assigned jshilong and unassigned Czm369 Jan 3, 2023
@ZwwWayne ZwwWayne requested a review from jshilong January 3, 2023 03:29
@ZwwWayne ZwwWayne added this to the 3.0.0rc6 milestone Jan 3, 2023
@Li-Qingyun
Copy link
Contributor Author

For convenience, I temporarily put the Chinese document together with the English document and images, and will adjust the file location after most of the writing is done. (I wrote Chinese first, then translated it to English. Hence, CN doc has been added.)

Li-Qingyun and others added 3 commits January 13, 2023 12:48
1. Introduce two kinds of data flow;
2. Why we unify the data flow; 
3. Examples (to be cotinued)...
@ZwwWayne
Copy link
Collaborator

Invite @ZCMax to review


For the forward process, except for the earliest `extracting features` and the latest `calculating with head`, the intermediate forward process of transformer are designed as four steps: `pre_transformer`, `forward_encoder`, `pre_decoder`, and `forward_decoder`. The parameters flow among the functions are summarized as follow:

![DETR_forward_process](.\DETR_forward_process.png)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a githubusercontent link instead of a png file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a githubusercontent link instead of a png file?

I'm just using local images temporarily.
The image may require adjustment.
The link will be added in the final commit.


In most DETRs, the features extracted from backbone and neck were fed into a Transformer which is composed of an encoder and a decoder. The Transformer directly outputs a set of queries in parallel. Each query corresponds to one prediction, which may be an object or `no object`.

![DETR_overall](.\DETR_overall.png)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a githubusercontent link instead of a png file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a githubusercontent link instead of a png file?

OK~ Thanks very much for your review~


To support more operations on multi-scale features, some extra information should be introduced. For example, the feature `spatial shape` on each level, the `lvl_start_index` (the start sequence indexes of each feature level), and so on. The `spatial shape` and `lvl_start_index` can be used to restore the sequence feature of `(B, N, C)` to the tuple of multi-scale features of `B, C, H_l, W_l`. They can also support special multi-scale feature interaction operations, such as Deformable Attention.

<img src="C:\Users\lqy\Desktop\doc_detr\DETR_mlvl_feats2seq.png" style="zoom:50%;" />
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong image link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong image link

OK~ Thanks very much for your review~


There are positional embeddings for the inputs of attention modules in DETRs. Unlike most cases, DETRs only embed for queries and keys, and not embed for values. Moreover, DETRs embed positions of both spatial directions, i.e. row and column, namely 2D position encoding.

![](C:\Users\lqy\Desktop\doc_detr\DETR_positional_encoding.png)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong image link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong image link

OK~ Thanks very much for your review~

docs/en/advanced_guides/detection_transformer.md Outdated Show resolved Hide resolved
Co-authored-by: Range King <RangeKingHZ@gmail.com>
@jshilong
Copy link
Collaborator

@Li-Qingyun This PR can be merged after adding the compatibility part

@Li-Qingyun
Copy link
Contributor Author

I'll try to rebase the branch and then resolve the conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants