You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
First thank your very much for your work. It adds a huge improvement to DETR family.
And your paper was really well explained and written.
Also thank you for publishing your code & models, it was very easy to run it.
I have few questions about the implementation :
What is the difference between between models/ and impl_a/models/ ? (I compared few files, I only identified some typo changes, but I don't want to miss something)
Does the model and the training process is compatible with fp16 precision ?
In DeformableAttention, do you use reference point or bbox reference ? (reference_points is (bs, len_q, n_levels, 2) or (bs, len_q, n_levels, 4) ?)
What is the role of self.im2col_step = 64 in MSDeformAttn ?
From what I understood of attention mechanism, normally schematically : attention_weights = softmax(dot_product(proj_q(Q), proj_k(K))), but we have attention_weights = softmax(proj_q(Q)), where proj_q is self.attention_weights = nn.Linear(d_model, n_heads * n_levels * n_points).
The text was updated successfully, but these errors were encountered:
Thank you for your interest to our work. For your questions:
What is the difference between between models/ and impl_a/models/ ?
impl_a is the implementation (a) of mixed supervision as illustrated by Figure 4 (a) in our paper. The main difference lies in the deformable_detr.py (L122-L125, L198-L219) for impl_a we did not change the architecture of decoder layers and adds an auxiliary predictors for one-to-many predictions. More details are available in Section 3.3 of our paper.
Does the model and the training process is compatible with fp16 precision ?
We did not test it under fp16 precision, it depends on the whether the MS-Deform operators, which we directly borrowed from Deformable-DETR supports fp16 precision. Maybe you can check in the original implementation repository, and as I know some third party implementation has supported fp16 for MS-Deform operators.
In DeformableAttention, do you use reference point or bbox reference ?
We use the reference points.
What is the role of self.im2col_step = 64 in MSDeformAttn ?
The im2col_step may relates to some memory efficiency operations in the tensor operations as implemented by Deformable-DETR. I did not investigate it in detail.
And the attention weights computed in MSDeformAttn is also different from what vanilla attention operation in PyTorch, the attention weights is computed by Q and reference points, as described in the paper of Deformable-DETR, you can check it for more details.
I hope this clears up your questions. Feel free to reach out if you have other questions.
Hi,
First thank your very much for your work. It adds a huge improvement to DETR family.
And your paper was really well explained and written.
Also thank you for publishing your code & models, it was very easy to run it.
I have few questions about the implementation :
models/
andimpl_a/models/
? (I compared few files, I only identified some typo changes, but I don't want to miss something)fp16
precision ?reference_points
is(bs, len_q, n_levels, 2)
or(bs, len_q, n_levels, 4)
?)self.im2col_step = 64
inMSDeformAttn
?class MSDeformAttn(nn.Module)
:From what I understood of attention mechanism, normally schematically :
attention_weights = softmax(dot_product(proj_q(Q), proj_k(K)))
, but we haveattention_weights = softmax(proj_q(Q))
, whereproj_q
isself.attention_weights = nn.Linear(d_model, n_heads * n_levels * n_points)
.The text was updated successfully, but these errors were encountered: