🚀 Examples v0.0.4

Examples v0.0.4 is released! We've been hard at work adding features, fixing bugs, and improving our starter code for training models using MosaicML's stack!

To get started, either clone or fork this repo and install whichever example[s] you're interested in. E.g., to get started training GPT-style Large Language Models, just:

git clone https://github.com/mosaicml/examples.git
cd examples # cd into the repo
pip install -e ".[llm]"  # or pip install -e ".[llm-cpu]" if no NVIDIA GPU
cd examples/llm # cd into the specific example's folder

Available examples include llm, stable-diffusion, bert, resnet-cifar, resnet-imagenet, llm, deeplab, nemo, gpt-neox.

New Features

Lots of improvements to our MosaicGPT example code, resulting in new and improved throughput and ease of use!
- Updated throughput and MFU numbers (#271)
- Various model architecture configuration options, including layer norm on keys and queries (#174), clipping of QKV (#197), omitting biases (#201), scaling the softmax (#209), more advanced weight initialization functions (#204, #220, #226), logit scaling (#221), better defaults (#270)
- MosaicGPT is now a HuggingFace PreTrainedModel (#243, #252, #256)
- Support for PrefixLM and UL2 style training (#179, #189, #235, #248)
- Refactor the different attention implementations to all have compatible state dicts (#240)
- Add support for KV caching (#244)
- Fused Cross Entropy loss function (#251)
- Full support for ALiBi with triton and torch implementations of attention
- Support for "within sequence" attention when packing sequences together (#266)
- Useful callbacks and optimizers for resuming runs that encountered a loss spike (#246)
A new stable diffusion finetuning example! (#85)

We've added an example of how to finetune stable diffusion using Composer and the MosaicML platform. Check out the README for more information.
Updated ONNX export (#283) and text generation (#277) example scripts
Version upgrades (#175, #242, #273, #275)

Updated versions of PyTorch, Composer, and Streaming.
Adds an example of running GPT-NeoX on the MosaicML platform (#195)

Deprecations and API changes

convert_c4.py renamed to convert_dataset.py (#162)

We renamed the dataset conversion script, and generalized it to work more easily with different input datasets.
Renamed cifar and resnet to resnet-cifar and resnet-imagenet, respectively (#173)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.4

🚀 Examples v0.0.4

New Features

Deprecations and API changes