You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when people want to do checkpointing write for transformer type workloads, one has to input the layer_parameters and optimization groups. All these parameters actually can be derived from the higher level parameters such as hidden dimension, ffn size, vocab_size, etc. Asking users to directly input layer_parameters and optimization groups in the configure file is not so good. We can hide all the lower level details here.
The text was updated successfully, but these errors were encountered:
Currently, when people want to do checkpointing write for transformer type workloads, one has to input the layer_parameters and optimization groups. All these parameters actually can be derived from the higher level parameters such as hidden dimension, ffn size, vocab_size, etc. Asking users to directly input layer_parameters and optimization groups in the configure file is not so good. We can hide all the lower level details here.
The text was updated successfully, but these errors were encountered: