-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix RT-DETR weights initialization #31724
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving this! Does the model converge as fast as the original implementation?
I didn't have a chance to run the fine-tuning with the original code, maybe @SangbumChoi has a fine-tuning script to compare. However, I would say that from my previous experiments with other detection models in |
@qubvel is there anything else that needs to be done? |
@qubvel Isn't SDPA is default operation in MDHA?
Since there are many FLOPS in encoder (which is not related to Attention module) I guess speed-up with applying attention friendly library such as SDPA, xformers might be marginal. @qubvel @NielsRogge Thanks for this PR. (Good to here that this is the best result by far) Unfortunately I don't have any results of finetuning raw RTDETR repo. (I have some test result in Transformers RTDETR). |
@SangbumChoi I'm talking about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
What does this PR do?
Fix RT-DETR bbox and class head weight initialization.
_init_weight
method bbox and class heads are not reachable for initialization. This sometimes leads to unstable training and lower results (see experiments below).prior_prob=0.01
which is OK for training with 80 classes, however, while fine-tuning this value should be adjusted.Results of the fine-tuning on
main
vsfix
branches on CPPE-5 dataset (averaged for 6 runs each):Who can review?
@amyeroberts
cc @SangbumChoi @NielsRogge