Discussion: how to apply this experiment to the llama2 70B model?

I am curious what is required to apply this method to the 70B parameter version of the llama2 model?
On reddit, noticed you mention: "For training, these models barely fit in 128 80GB A100s using DeepSpeed and FA2"
Would the computer at OSC be enough? https://www.osc.edu/resources/technical_support/supercomputers/ascend
Only 96 80GB A100 GPUs: Is that enough to contribute to the SoTA (State of the art)? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discussion: how to apply this experiment to the llama2 70B model? #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Discussion: how to apply this experiment to the llama2 70B model? #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions