Info on Deberta-v2-xlarge training infra #125

karthickgopalswamy · 2023-03-22T06:01:50Z

The paper talks about DeBERTa-large, base and DeBERTa1.5B model on V100 GPU. How is the DeBERTa-v2-xlarge trained? is the settings for the xlarge model same as used for large model in the paper? With DeBERTa-v2-xlarge having 900M parameters is any tensor parallelism used for training?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Info on Deberta-v2-xlarge training infra #125

Info on Deberta-v2-xlarge training infra #125

karthickgopalswamy commented Mar 22, 2023

Info on Deberta-v2-xlarge training infra #125

Info on Deberta-v2-xlarge training infra #125

Comments

karthickgopalswamy commented Mar 22, 2023