Releases: lweitkamp/numpitron
Releases · lweitkamp/numpitron
Data Parallel
NuMPItron now supports data parallel training. A model can be trained as follows using a combination of tensor/data parallel:
mpirun -n {1, 2, ...} python train_shakespeare.py \
--tensor-parallel-size {1, 2, ...} \
--data-parallel-size {1, 2, ...}
The global batch size argument (--batch-size
) is split between devices.
Tensor Parallel
This release adds Tensor Parallel training using MPI. It should work with uneven dimensions up to some degree, as long as the tensor parallel size is less than the total dimension.
To run a tensor parallel run with two devices run the following command.
mpirun -n 2 python train_shakespeare.py --tensor-parallel-size 2
You can also sample in tensor parallel mode:
mpirun -n 2 python sample.py --tensor-parallel-size 2
0.1.0
What's Changed
- add pyproject by @lweitkamp in #1
- add vscode folder to gitignore by @lweitkamp in #2
- Add Neural Network Library by @lweitkamp in #3
Full Changelog: https://github.com/lweitkamp/numpitron/commits/0.1.0