Releases: mosaicml/examples
v0.0.4
🚀 Examples v0.0.4
Examples v0.0.4 is released! We've been hard at work adding features, fixing bugs, and improving our starter code for training models using MosaicML's stack!
To get started, either clone or fork this repo and install whichever example[s] you're interested in. E.g., to get started training GPT-style Large Language Models, just:
git clone https://github.com/mosaicml/examples.git
cd examples # cd into the repo
pip install -e ".[llm]" # or pip install -e ".[llm-cpu]" if no NVIDIA GPU
cd examples/llm # cd into the specific example's folder
Available examples include llm
, stable-diffusion
, bert
, resnet-cifar
, resnet-imagenet
, llm
, deeplab
, nemo
, gpt-neox
.
New Features
-
Lots of improvements to our MosaicGPT example code, resulting in new and improved throughput and ease of use!
- Updated throughput and MFU numbers (#271)
- Various model architecture configuration options, including layer norm on keys and queries (#174), clipping of QKV (#197), omitting biases (#201), scaling the softmax (#209), more advanced weight initialization functions (#204, #220, #226), logit scaling (#221), better defaults (#270)
- MosaicGPT is now a HuggingFace
PreTrainedModel
(#243, #252, #256) - Support for PrefixLM and UL2 style training (#179, #189, #235, #248)
- Refactor the different attention implementations to all have compatible state dicts (#240)
- Add support for KV caching (#244)
- Fused Cross Entropy loss function (#251)
- Full support for ALiBi with
triton
andtorch
implementations of attention - Support for "within sequence" attention when packing sequences together (#266)
- Useful callbacks and optimizers for resuming runs that encountered a loss spike (#246)
-
A new stable diffusion finetuning example! (#85)
We've added an example of how to finetune stable diffusion using Composer and the MosaicML platform. Check out the README for more information.
-
Updated ONNX export (#283) and text generation (#277) example scripts
-
Version upgrades (#175, #242, #273, #275)
Updated versions of PyTorch, Composer, and Streaming.
-
Adds an example of running GPT-NeoX on the MosaicML platform (#195)
Deprecations and API changes
v0.0.3
🚀 Examples v0.0.3
Examples v0.0.3 is released! We've been hard at work adding features, fixing bugs, and improving our starter code for training models using MosaicML's stack!
To get started, either clone or fork this repo and install whichever example[s] you're interested in. E.g., to get started training GPT-style Large Language Models, just:
git clone https://github.com/mosaicml/examples.git
cd examples # cd into the repo
pip install -e ".[llm]" # or pip install -e ".[llm-cpu]" if no NVIDIA GPU
cd examples/llm # cd into the specific example's folder
Available examples include bert
, cifar
, llm
, resnet
, deeplab
.
New Features
-
Tooling for computing throughput and Model Flops Utiliziation (MFU) using MosaicML Cloud (#53, #56, #71, #117, #152)
We've made it easier to benchmark throughput and MFU on our Large Language Model (LLM) stack. The
SpeedMonitor
has been extended to report MFU. It is on by default for ourMosaicGPT
examples, and can be easily added to your own code by definingnum_fwd_flops
for your model and adding theSpeedMonitorMFU
callback to theTrainer
. See the callback for the details!We've also used our MCLI SDK to easily measure throughput and MFU of our LLMs across a range of parameters. The tools and results are in our throughput folder. Stay tuned for an update with the latest numbers!
-
Upgrade to the latest versions of Composer, Streaming, and Flash Attention (#54, #61, #118, #124)
We've upgraded all our examples to use the latest versions of Composer, Streaming, and Flash Attention. This means speed improvements, new features, and deterministic, elastic, mid-epoch resumption, thanks to our Streaming library!
-
The repo is now pip installable from source (#76, #90)
The repo can now be easily installed for whichever example you are interested in using. For example, to install components for the llm example, navigate to the root and run
pip install -e .[llm]
. We will be putting the package on PyPi soon! -
Support for FSDP wrapping more HuggingFace models (#83, #106)
We've added support for using FSDP to wrap more types of HuggingFace models like BLOOM and OPT.
-
In-Context Learning (ICL) evaluation metrics (#116)
The ICL evaluation tools from Composer 0.12.1 are now available for measuring metrics like LAMBADA, HellaSwag, PIQA, etc. for Causal LMs. See the
llm/icl_eval/
folder for templates. These ICL metrics can also be measured live during training with minimal overhead. Please see our blogpost for more details. -
Simple BERT finetuning example (#141)
In addition to our example of finetuning BERT on the full suite of GLUE tasks, we've added an example of finetuning on a single sequence classification dataset. This should be a simple entrypoint to finetuning BERT compared with all the bells and whistles of our GLUE example.
-
NeMo Megatron example (#84, #138)
We've added a simple example of how to get started running NeMo Megatron on MCloud!
Deprecations
-
🚨
group_method
argument forStreamingTextDataset
replaced 🚨 (#128)In order to support the deterministic shuffle with elastic resumption, we could no longer concatenate text examples on the fly in the dataloader. This means that we have deprecated the
group_method
argument ofStreamingTextDataset
. In order to use concatenated text (which is a standard practice for pretraining LLMs), you can use theconvert_c4.py
script with the--concat_tokens
option. This will pretokenize your dataset, and pack sequences together up to the maximum sequence length so that your pretraining examples have no padding. To use the equivalent of the oldtruncate
option, you can useconvert_c4.py
without the--concat_tokens
option, and the dataloader will truncate or pad sequences to the maximum sequence length on the fly.