Releases: mosaicml/llm-foundry
v0.2.0
🚀 LLM Foundry v0.2.0
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLM). LLM Foundry serves as the efficient training codebase for the MPT-7B and MPT-30B models. Our emphasis is on efficiency, scalability, and ease-of-use, to enable fast iteration and prototyping.
We are excited to share the release of v0.2.0
, packed with support for new hardware, features, and tutorials.
📖 Tutorials
We have released new tutorial content and helper scripts for dataset preparation, pre-training, fine-tuning, and inference!
To start off, a basic walkthrough and answers to FAQs can be found in our Basic Tutorial.
Next, detailed guides for different workflows are linked below:
Training
In addition, for a more advanced and self-contained example of finetuning the MPT-7B model, see Finetune Example.
Inference
The inference tutorials cover several new features we've added that improve integration with HuggingFace and FasterTransformer libraries:
- Converting a Composer checkpoint to an HF checkpoint folder
- Interactive Generation with HF models
- Interactive Chat with HF models
- Converting an HF model to ONNX
- Converting an HF MPT to FasterTransformer
- Running MPT with FasterTransformer
Major Features
LLM Foundry now uses Composer v0.15.0
and Streaming v0.5.1
as minimum requirements. For more details, see their release notes for Composer and Streaming for all the improvements.
-
🆕 Torch 2.0 support
LLM Foundry is now Torch 2.0 compatible!
Note: we have not tested
torch.compile
, but do not expect significant performance improvements. -
⚡ H100 Support
We now support NVIDIA H100 systems! See our blog post on Benchmarking LLMs on H100 GPUs for initial performance and convergence details.
To run LLM Foundry with NVIDIA H100 systems, be sure to use a docker images that has CUDA 11.8+ and PyTorch 2.0+ versions.
For example,
mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04
from our dockerhub has been tested with NVIDIA H100 systems.No code changes should be required.
-
📈 AMD MI250 GPU Support
With the release of PyTorch 2.0 and ROCm 5.4+, excited to share that LLM training now works out of the box on AMD Datacenter GPUs! Read our blog post on Training LLMs with AMD MI250 GPUs for more details.
Running with our stack was straightforward: use the ROCm 5.4 docker image
rocm/dev-ubuntu-20.04:5.4.3-complete
; and install PyTorch for ROCm 5.4 and install Flash Attention.Modify your configuration settings:
attn_impl=flash
instead of the defaulttriton
- Note: ALiBi is currently not supported with
attn_impl=flash
.
- Note: ALiBi is currently not supported with
loss_fn=torch_crossentropy
instead of the defaultfused_crossentropy
.
-
🚧 LoRA finetuning (Preview)
We have included a preview release of Low Rank Adaptation (LoRA) support for memory-efficient fine-tuning of LLMs (Shen et al, 2021).
To use LoRA, follow the instructions found here.
Note: This is a preview feature, please let us know any feedback! The API and support is subject to change.
-
🔎 Evaluation Refactor (#308)
Our evaluation suite has been significantly refactored into our Model Gauntlet approach. This includes a number of breaking API changes to support multiple models:
- Instead of
model
, use themodels
keyword and provide a list of models. tokenizer
is now model-specific.
For example, to run the gauntlet of various eval tasks with
mosaicml/mpt-7b
:cd llm-foundry/scripts composer eval/eval.py eval/yamls/hf_eval.yaml model_name_or_path=mosaicml/mpt-7b
This release also makes evaluation deterministic even on different number of GPUs.
For more details on all these changes, see #308
- Instead of
-
⏱️ Benchmarking Inference
To better support the deployment of LLMs, we have included inference benchmarking suite and results across different hardware and other LLM models.
PR List
- hf dict cfg overrides by @vchiley in #90
- Add slack and license buttons to readme by @growlix in #98
- Add minimum
mosaicml-streaming
version by @hanlint in #110 - Update dataloader.py by @nelsontkq in #102
- Add features to hf_generate by @alextrott16 in #116
- Make mpt7b finetuning more obvious by @samhavens in #101
- Fix(finetune yaml): fix parameters in mpt-7b_dolly_sft.yaml by @alanxmay in #131
- Fix HF conversion script to upload to S3 after editing the files to be HF compatible by @dakinggg in #136
- Set pad_token_id to tokenizer.pad_token_id if not set on command line by @patrickhwood in #118
- Changed the keep_zip default to False to comply with StreamingDataset by @karan6181 in #150
- Add cloud upload to checkpoint conversion script by @dakinggg in #151
- Adds precision to eval by @mvpatel2000 in #148
- Update StreamingDataset defaults by @abhi-mosaic in #157
- Explain
composer
command by @hanlint in #164 - Remove
pynvml
by @hanlint in #165 - Adds a concrete finetuning example from a custom dataset by @alextrott16 in #156
- Remove health checker by @mvpatel2000 in #167
- Rename datasets to avoid hf conflict by @hanlint in #175
- Torch2 (#177) by @vchiley in #178
- Revert "Torch2 (#177) (#178)" by @dakinggg in #181
- clean up dataset conversion readme by @codestar12 in #168
- Convert MPT checkpoints to FT format by @dskhudia in #169
- Update README.md by @jacobfulano in #198
- Removed unused
tokenizer_name
config field by @dakinggg in #206 - Add community links to README by @hanlint in #182
- Add Tensorboard logger to yaml config by @hanlint in #166
- Update inference README by @abhi-mosaic in #204
- torch2 updt with hf fixes by @vchiley in #193
- Removing deprecated vocabulary size parameter from composer CE metrics by @sashaDoubov in #222
- Add `composer[...
v0.1.1
What's New
LLM Foundry is now on PyPI!
What's Changed
- Update README.md by @ejyuen in #72
- Update version by @dakinggg in #73
- Remove todo in workflow by @mvpatel2000 in #74
- Bump composer version by @vchiley in #84
- Fix pypi by @mvpatel2000 in #80
- Remove xentropy from pypi by @mvpatel2000 in #86
- Fix sed command for xentropy by @mvpatel2000 in #87
- Updates to prefixlm and t5 by @alextrott16 in #85
- Disable image for pypi by @mvpatel2000 in #97
New Contributors
Full Changelog: v0.1.0...v0.1.1
Announcing LLM Foundry and the MPT foundation series
🚀 LLM Foundry v0.1.0
This is the first release of MosaicML's LLM Foundry!
Our efficient code for training, evaluating, and deploying LLMs outgrew our examples repository, so we've migrated to a brand new repository dedicated to everything LLMs. Keep watching this space and see the top-level README and our blog post for more details on this announcement!
Model releases
In addition to all the open-source code released here, we're releasing four open-source models that we hope will be useful to the community. All models were trained on the MosaicML platform, using Composer and Streaming. If you're interested in training your own models, or using these models with our optimized inference stack, please reach out!
mpt-7b
: This is our base 7-billion parameter model, trained for 1 trillion tokens. This model is released with an Apache-2.0 (commercial use permitted) license.mpt-7b-storywriter
: All of the models use ALiBi to allow them to exrapolate to longer sequence lengths than they saw during training, but storywriter is our long context model, further pretrained on 65k-token excerpts of a fiction subset of the books3 corpus. This model is released with an Apache-2.0 (commercial use permitted) license.mpt-7b-instruct
: This model is instruction finetuned on a dataset we also release, derived from Databrick's Dolly-15k and Anthropic’s Helpful and Harmless datasets. This model is released with a CC-By-SA-3.0 (commercial use permitted) license.mpt-7b-chat
: This model is trained to be able to chat by further training on the ShareGPT-Vicuna, HC3, Alpaca, Helpful and Harmless, and Evol-Instruct datasets. This model is released with a CC-By-NC-SA-4.0 (non-commercial use only) license.
Features
Training
We release fully featured code for efficiently training any HuggingFace LLM (including our optimized MPT using FSDP, Composer, and Streaming. Seamlessly scale to multi-gpu and multi-node training, stream your data from one cloud, train on a different cloud, write checkpoints to a third cloud, send your training logs to Weights&Biases, and much more. See the README for more detailed instructions on getting started pretraining and finetuning!
Our MPT model is equipped with the latest advancements in training large transformers (e.g. ALiBi, the LION optimizer, FlashAttention), and is desgined to be easily hackable, configurable, and extendable!
Evaluation
Our evaluation framework, makes it easy to fully re-evaluate any HuggingFace model. We also include copies of the processed data for many popular benchmarks, to make it easy to replicate our evals, and perform your own! We welcome the addition of new benchmarks to our suite. In previous benchmarks, our setup is 8x faster than other eval frameworks on a single GPU and seamlessly achieves linear scaling with multiple GPUs. Built-in support for FSDP makes it possible to evaluate large models and use larger batch sizes for further acceleration.
Inference
MPT is designed to be fast, easy, and cheap to deploy for inference. To begin with, all MPT models are subclassed from the HuggingFace PretrainedModel base class, which means that they are fully compatible with the HuggingFace ecosystem. You can upload MPT models to the HuggingFace Hub, generate outputs with standard pipelines like model.generate(...)
, build HuggingFace Spaces (see some of ours here!), and more.
What about performance? With MPT’s optimized layers (including FlashAttention and low precision layernorm), the out-of-the-box performance of MPT-7B on GPUs when using model.generate(...)
is 1.5x-2x faster than other 7B models like LLaMa-7B. This makes it easy to build fast and flexible inference pipelines with just HuggingFace and PyTorch.
Finally, for the best hosting experience, deploy your MPT models directly on MosaicML’s Inference service. Start with our managed endpoints for models like MPT-7B-Instruct, and/or deploy your own custom model endpoints for optimal cost and data privacy. Check out the Inference blog post for more details!