Skip to content

Discussion: how to apply this experiment to the llama2 70B model? #11

@ghost

Description

I am curious what is required to apply this method to the 70B parameter version of the llama2 model?
On reddit, noticed you mention: "For training, these models barely fit in 128 80GB A100s using DeepSpeed and FA2"
Would the computer at OSC be enough? https://www.osc.edu/resources/technical_support/supercomputers/ascend
Only 96 80GB A100 GPUs: Is that enough to contribute to the SoTA (State of the art)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions