Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

[Question] GPU Memory Related #3

Open
chagmgang opened this issue Mar 7, 2023 · 1 comment
Open

[Question] GPU Memory Related #3

chagmgang opened this issue Mar 7, 2023 · 1 comment

Comments

@chagmgang
Copy link

In your research, it is hard to train model with long sequence such as 768 in gpu.
However, I can't find any special way to reduce gpu memory in your code.
I want to know about your technique for training long sequence tokens with vision transformer.

@ronghanghu
Copy link
Contributor

Hi @chagmgang, in our experiments, we used Google Cloud TPUs for pretraining. For GPUs, I think one can use gradient accumulation (with --accum_iter) and activation checkpointing to reduce memory usage

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants