Skip to content

Releases: lweitkamp/numpitron

Data Parallel

05 Jun 05:52
01dc9d0
Compare
Choose a tag to compare

NuMPItron now supports data parallel training. A model can be trained as follows using a combination of tensor/data parallel:

mpirun -n {1, 2, ...} python train_shakespeare.py \
    --tensor-parallel-size {1, 2, ...} \
    --data-parallel-size {1, 2, ...}

The global batch size argument (--batch-size) is split between devices.

Tensor Parallel

27 May 20:13
fc5f52c
Compare
Choose a tag to compare

This release adds Tensor Parallel training using MPI. It should work with uneven dimensions up to some degree, as long as the tensor parallel size is less than the total dimension.

To run a tensor parallel run with two devices run the following command.

mpirun -n 2 python train_shakespeare.py --tensor-parallel-size 2

You can also sample in tensor parallel mode:

mpirun -n 2 python sample.py --tensor-parallel-size 2

0.1.0

30 Mar 12:58
8538385
Compare
Choose a tag to compare
0.1.0 Pre-release
Pre-release

What's Changed

Full Changelog: https://github.com/lweitkamp/numpitron/commits/0.1.0