🚀 Examples v0.0.4
Examples v0.0.4 is released! We've been hard at work adding features, fixing bugs, and improving our starter code for training models using MosaicML's stack!
To get started, either clone or fork this repo and install whichever example[s] you're interested in. E.g., to get started training GPT-style Large Language Models, just:
git clone https://github.com/mosaicml/examples.git
cd examples # cd into the repo
pip install -e ".[llm]" # or pip install -e ".[llm-cpu]" if no NVIDIA GPU
cd examples/llm # cd into the specific example's folder
Available examples include llm
, stable-diffusion
, bert
, resnet-cifar
, resnet-imagenet
, llm
, deeplab
, nemo
, gpt-neox
.
New Features
-
Lots of improvements to our MosaicGPT example code, resulting in new and improved throughput and ease of use!
- Updated throughput and MFU numbers (#271)
- Various model architecture configuration options, including layer norm on keys and queries (#174), clipping of QKV (#197), omitting biases (#201), scaling the softmax (#209), more advanced weight initialization functions (#204, #220, #226), logit scaling (#221), better defaults (#270)
- MosaicGPT is now a HuggingFace
PreTrainedModel
(#243, #252, #256) - Support for PrefixLM and UL2 style training (#179, #189, #235, #248)
- Refactor the different attention implementations to all have compatible state dicts (#240)
- Add support for KV caching (#244)
- Fused Cross Entropy loss function (#251)
- Full support for ALiBi with
triton
andtorch
implementations of attention - Support for "within sequence" attention when packing sequences together (#266)
- Useful callbacks and optimizers for resuming runs that encountered a loss spike (#246)
-
A new stable diffusion finetuning example! (#85)
We've added an example of how to finetune stable diffusion using Composer and the MosaicML platform. Check out the README for more information.
-
Updated ONNX export (#283) and text generation (#277) example scripts
-
Version upgrades (#175, #242, #273, #275)
Updated versions of PyTorch, Composer, and Streaming.
-
Adds an example of running GPT-NeoX on the MosaicML platform (#195)