[Question] GPU Memory Related #3

chagmgang · 2023-03-07T10:05:34Z

In your research, it is hard to train model with long sequence such as 768 in gpu.
However, I can't find any special way to reduce gpu memory in your code.
I want to know about your technique for training long sequence tokens with vision transformer.

ronghanghu · 2023-03-07T16:41:10Z

Hi @chagmgang, in our experiments, we used Google Cloud TPUs for pretraining. For GPUs, I think one can use gradient accumulation (with --accum_iter) and activation checkpointing to reduce memory usage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] GPU Memory Related #3

[Question] GPU Memory Related #3

chagmgang commented Mar 7, 2023

ronghanghu commented Mar 7, 2023

[Question] GPU Memory Related #3

[Question] GPU Memory Related #3

Comments

chagmgang commented Mar 7, 2023

ronghanghu commented Mar 7, 2023