You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In testing PR #671, we noticed that the GPT-2 model now exhausts all available memory on 8 GB GPUs (example: GTX 1080) for both eager mode and X10 runtimes. It did not do this previously, so at some point the RAM usage of this model has increased to the point where it can no longer train on these GPUs.
We should investigate why this happened and see if memory usage for this model can be brought back down.
The text was updated successfully, but these errors were encountered:
As I tested an 16GB GPU VM, I do see that among the last nearly 2/3 of the epochs (10 epochs in total), a peak memory usage of 9187MB happens once in each epoch, and they happen around last training batch.
In testing PR #671, we noticed that the GPT-2 model now exhausts all available memory on 8 GB GPUs (example: GTX 1080) for both eager mode and X10 runtimes. It did not do this previously, so at some point the RAM usage of this model has increased to the point where it can no longer train on these GPUs.
We should investigate why this happened and see if memory usage for this model can be brought back down.
The text was updated successfully, but these errors were encountered: