Interface with Hugging Face Accelerate for distributed training #11

rosikand · 2022-12-29T20:05:36Z

Create a new distributed_train function in torchplate.experiment.Experiment which interfaces with Hugging Face Accelerate for zero-overhead distributed training of PyTorch models. Avoid .to(device) placements as the accelerate library will handle this for you. Can call this function even with one GPU.

The text was updated successfully, but these errors were encountered:

rosikand · 2022-12-29T20:11:34Z

Optional parameters:

split_batches=True: whether to use true batch size or distributed batch size

rosikand · 2022-12-29T20:12:46Z

Note: I think you will have to make heavy edits to get it to interface with the metrics properly (see this function). Also, true for model serialization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interface with Hugging Face Accelerate for distributed training #11

Interface with Hugging Face Accelerate for distributed training #11

rosikand commented Dec 29, 2022

rosikand commented Dec 29, 2022

rosikand commented Dec 29, 2022

Interface with Hugging Face Accelerate for distributed training #11

Interface with Hugging Face Accelerate for distributed training #11

Comments

rosikand commented Dec 29, 2022

rosikand commented Dec 29, 2022

rosikand commented Dec 29, 2022