Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about training speed #14

Open
tanbuzheng opened this issue Oct 28, 2023 · 4 comments
Open

Question about training speed #14

tanbuzheng opened this issue Oct 28, 2023 · 4 comments

Comments

@tanbuzheng
Copy link

tanbuzheng commented Oct 28, 2023

Hi,I tried to use HiLo block on the image restoration task, but I found that the training speed of HiLo is much slower compared to Swin. Specifically, we mainly adopt HiLo block on 64x64 resolution and set the local window size to 8x8. What is the reason for this? Is there any way to increase training speed?

@HubHop
Copy link
Contributor

HubHop commented Oct 28, 2023

Hi @tanbuzheng, thanks for using HiLo. Can you give more details or a brief script of your testing? e.g., tensor shape of your feature map, concrete setting of HiLo including dim, num_heads, alpha.

@tanbuzheng
Copy link
Author

Thank you very much for your immediate reply! I used 8 stacked HiLo Transformer blocks on 64x64 resolutions,in which head_num=8, head_dim=32, alpha=0.5, local window sizes=8x8. And the other module settings are same. I guess it is the depth-wise Conv used in FFN that affects the running speed? I wonder if there are any practical reasons causing that HiLo is slower than Swin?

@HubHop
Copy link
Contributor

HubHop commented Oct 28, 2023

Thanks for the addtional information. If we only look at the Attention layer itself, theoretically HiLo attention should be faster than Local Window Attention,

from hilo import HiLo

W = H = 64

# all heads for Hi-Fi, pure local window attention
model = HiLo(dim=256, num_heads=8, window_size=8, alpha=0.)
print(f'All for Hi-Fi, FLOPs = {round(model.flops(H, H)/1e9, 2)}G') 

# all heads for Lo-Fi
model = HiLo(dim=256, num_heads=8, window_size=8, alpha=1.)
print(f'All for Lo-Fi, FLOPs = {round(model.flops(H, H)/1e9, 2)}G') 


# all heads for Lo-Fi
model = HiLo(dim=256, num_heads=8, window_size=8, alpha=0.5)
print(f'Half-Half, FLOPs = {round(model.flops(H, H)/1e9, 2)}G') 

Output:

All for Hi-Fi, FLOPs = 1.21G
All for Lo-Fi, FLOPs = 0.68G
Half-Half, FLOPs = 0.81G

Therefore, the attention layer is not the issue. Compared to Swin Transformer blocks, the only difference in FFN is that LITv2 has one additional depthwise convolutional layer. You can debug your code by removing this layer to see if the speed can be improved.

@tanbuzheng
Copy link
Author

Thanks a lot! I will try it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants