Skip to content

[Codefuse开源轻训营] Support for gradient accumulation #9

@elvis-t9

Description

@elvis-t9

Add support for Gradient Accumulation to enable training with larger effective batch sizes on resource-constrained hardware. This feature is crucial for training high-quality embedding models when GPU memory is limited, allowing users to simulate large batch training by accumulating gradients over multiple smaller batches before performing optimization steps.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions