Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF-transformer #60

Open
a897456 opened this issue Dec 18, 2024 · 1 comment
Open

TF-transformer #60

a897456 opened this issue Dec 18, 2024 · 1 comment

Comments

@a897456
Copy link

a897456 commented Dec 18, 2024

作者你好,我看你论文的消融实验,对T-transformer和F-transformer,没有进行消融,这两个是必须的吗?尤其是F_transformer,它只会增加内存负担,因为batch_size*t=1248,我感觉很多CUDA OUT MEMORY 以及训练速度缓慢,和F-transformer有很大关系。

@a897456
Copy link
Author

a897456 commented Dec 18, 2024

还有一个问题,你是将[b,c,t,f] 转为[b * f,t,c]进T-transformer,然后[b * t,f,c]进F-transformer。
可不可以改为[b * c,t,f]进T-transformer [b * c,f,t]进F-transformer,
因为截取相同长度和相同的FFT,所以t和f是固定的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant