You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The paper talks about DeBERTa-large, base and DeBERTa1.5B model on V100 GPU. How is the DeBERTa-v2-xlarge trained? is the settings for the xlarge model same as used for large model in the paper? With DeBERTa-v2-xlarge having 900M parameters is any tensor parallelism used for training?
The text was updated successfully, but these errors were encountered:
The paper talks about DeBERTa-large, base and DeBERTa1.5B model on V100 GPU. How is the DeBERTa-v2-xlarge trained? is the settings for the xlarge model same as used for large model in the paper? With DeBERTa-v2-xlarge having 900M parameters is any tensor parallelism used for training?
The text was updated successfully, but these errors were encountered: