Strange runtime results #2

cszer · 2020-08-28T14:12:50Z

Hello , i tested inference speed and compared it with simple torchvision resnet50 .
I used 2080ti and pytorch 1.4
Results are :
torchvision resnet50 - 13-15 ms
axial-resnet-s - 79-81ms
But in the paper authors show that inference speed of L model is comparable with Resnet101

cszer · 2020-08-28T14:14:51Z

I tested models on 224х224 torch.rand tensor

csrhddlam · 2020-08-29T01:46:37Z

Hello, thanks for testing it. Please note that this is a re-implementation and we haven't tried to match the inference time as the original code.

I think the main mismatch comes from implementation differences, possibly the positional encoding or batch normalization. In addition, I think the runtime won't increase much if you test it on L models -- we only change the width of the model and keep the depth the same, while ResNet doubles the depth.

netw0rkf10w · 2020-08-30T11:53:43Z

@csrhddlam Why don't you share the original TensorFlow implementation but a PyTorch re-implementation instead? I'm a bit confused...

csrhddlam · 2020-08-30T14:48:55Z

It would be almost impossible to release the original code in tensorflow (which runs on TPU), because it is Google property, and it depends on some other packages which are also Google property, e.g. stand-alone self-attention and panoptic-deeplab.

netw0rkf10w · 2020-08-30T15:13:57Z

@csrhddlam Hmm... Has Google recently changed their policy? They used to release TensorFlow code for their published papers...

csrhddlam · 2020-08-30T16:07:06Z

No, as far as I know. And sorry for the confusion. As I said, our original code depends heavily on stand-alone self-attention and panoptic-deeplab. However, they have not released their code and we are not authorized to release their code, so we cannot release our original code. Instead of waiting for their releases, we re-implement the work here in PyTorch to let the community access most details of our work as soon as possible.

netw0rkf10w · 2020-08-30T16:10:28Z

@csrhddlam I see. Thanks for the reply!
Good work by the way. Congratulations!

csrhddlam · 2020-09-01T21:37:23Z

Just investigated the inference time a bit. Here is my trace on a GPU. I tested it with both pytorch 1.1 and 1.6, and found similar results.

You can see that the relative positional embedding is taking way more time than a convolution.

In addition, reshaping, squeezing, and permutating are also taking way more time than bmm, where actual computation happens.

There is much room to optimize the code in this repo. Even the original TF code was optimized for TPU and tested directly on GPU. So we would expect that the inference time could be improved a lot when it is well optimized.

netw0rkf10w · 2020-09-10T14:33:58Z

einsum seems to have some performance issues, so maybe directly using bmm could be faster.

csrhddlam · 2020-09-10T14:53:50Z

Thanks for the pointer. I wasn't aware of the issue.

einsum in PyTorch looks less optimized than einsum_v2 in TensorFlow. And I agree that directly using bmm, together with some smart permute and view could be faster in PyTorch.

atiorh mentioned this issue Sep 4, 2020

Proposed improvement to make relative positional encoding faster #5

Closed

csrhddlam referenced this issue Sep 4, 2020

optimize relative positional encoding and batch norm reshapes

81e375b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange runtime results #2

Strange runtime results #2

cszer commented Aug 28, 2020

cszer commented Aug 28, 2020

csrhddlam commented Aug 29, 2020

netw0rkf10w commented Aug 30, 2020

csrhddlam commented Aug 30, 2020 •

edited

Loading

netw0rkf10w commented Aug 30, 2020

csrhddlam commented Aug 30, 2020

netw0rkf10w commented Aug 30, 2020

csrhddlam commented Sep 1, 2020 •

edited

Loading

netw0rkf10w commented Sep 10, 2020

csrhddlam commented Sep 10, 2020

Strange runtime results #2

Strange runtime results #2

Comments

cszer commented Aug 28, 2020

cszer commented Aug 28, 2020

csrhddlam commented Aug 29, 2020

netw0rkf10w commented Aug 30, 2020

csrhddlam commented Aug 30, 2020 • edited Loading

netw0rkf10w commented Aug 30, 2020

csrhddlam commented Aug 30, 2020

netw0rkf10w commented Aug 30, 2020

csrhddlam commented Sep 1, 2020 • edited Loading

netw0rkf10w commented Sep 10, 2020

csrhddlam commented Sep 10, 2020

csrhddlam commented Aug 30, 2020 •

edited

Loading

csrhddlam commented Sep 1, 2020 •

edited

Loading