Open
Description
I found that there is a lack of clarity in the original R2+1D paper and official code implementation for models utilizing BottleNeck layers, which makes it impossible to transfer weights from large pretrained models in C2 to the models implemented in torchvision.
For background info see the question in their repo.
The fix is very straightforward (change bottleneck midplanes computation), but the question is whether we should do it, which I suspect should be based on author's answer. I'm leaving this here just so that people are aware of it.