Add support for Gradient Accumulation to enable training with larger effective batch sizes on resource-constrained hardware. This feature is crucial for training high-quality embedding models when GPU memory is limited, allowing users to simulate large batch training by accumulating gradients over multiple smaller batches before performing optimization steps.