v0.2.0
🚀 LLM Foundry v0.2.0
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLM). LLM Foundry serves as the efficient training codebase for the MPT-7B and MPT-30B models. Our emphasis is on efficiency, scalability, and ease-of-use, to enable fast iteration and prototyping.
We are excited to share the release of v0.2.0
, packed with support for new hardware, features, and tutorials.
📖 Tutorials
We have released new tutorial content and helper scripts for dataset preparation, pre-training, fine-tuning, and inference!
To start off, a basic walkthrough and answers to FAQs can be found in our Basic Tutorial.
Next, detailed guides for different workflows are linked below:
Training
In addition, for a more advanced and self-contained example of finetuning the MPT-7B model, see Finetune Example.
Inference
The inference tutorials cover several new features we've added that improve integration with HuggingFace and FasterTransformer libraries:
- Converting a Composer checkpoint to an HF checkpoint folder
- Interactive Generation with HF models
- Interactive Chat with HF models
- Converting an HF model to ONNX
- Converting an HF MPT to FasterTransformer
- Running MPT with FasterTransformer
Major Features
LLM Foundry now uses Composer v0.15.0
and Streaming v0.5.1
as minimum requirements. For more details, see their release notes for Composer and Streaming for all the improvements.
-
🆕 Torch 2.0 support
LLM Foundry is now Torch 2.0 compatible!
Note: we have not tested
torch.compile
, but do not expect significant performance improvements. -
⚡ H100 Support
We now support NVIDIA H100 systems! See our blog post on Benchmarking LLMs on H100 GPUs for initial performance and convergence details.
To run LLM Foundry with NVIDIA H100 systems, be sure to use a docker images that has CUDA 11.8+ and PyTorch 2.0+ versions.
For example,
mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04
from our dockerhub has been tested with NVIDIA H100 systems.No code changes should be required.
-
📈 AMD MI250 GPU Support
With the release of PyTorch 2.0 and ROCm 5.4+, excited to share that LLM training now works out of the box on AMD Datacenter GPUs! Read our blog post on Training LLMs with AMD MI250 GPUs for more details.
Running with our stack was straightforward: use the ROCm 5.4 docker image
rocm/dev-ubuntu-20.04:5.4.3-complete
; and install PyTorch for ROCm 5.4 and install Flash Attention.Modify your configuration settings:
attn_impl=flash
instead of the defaulttriton
- Note: ALiBi is currently not supported with
attn_impl=flash
.
- Note: ALiBi is currently not supported with
loss_fn=torch_crossentropy
instead of the defaultfused_crossentropy
.
-
🚧 LoRA finetuning (Preview)
We have included a preview release of Low Rank Adaptation (LoRA) support for memory-efficient fine-tuning of LLMs (Shen et al, 2021).
To use LoRA, follow the instructions found here.
Note: This is a preview feature, please let us know any feedback! The API and support is subject to change.
-
🔎 Evaluation Refactor (#308)
Our evaluation suite has been significantly refactored into our Model Gauntlet approach. This includes a number of breaking API changes to support multiple models:
- Instead of
model
, use themodels
keyword and provide a list of models. tokenizer
is now model-specific.
For example, to run the gauntlet of various eval tasks with
mosaicml/mpt-7b
:cd llm-foundry/scripts composer eval/eval.py eval/yamls/hf_eval.yaml model_name_or_path=mosaicml/mpt-7b
This release also makes evaluation deterministic even on different number of GPUs.
For more details on all these changes, see #308
- Instead of
-
⏱️ Benchmarking Inference
To better support the deployment of LLMs, we have included inference benchmarking suite and results across different hardware and other LLM models.
PR List
- hf dict cfg overrides by @vchiley in #90
- Add slack and license buttons to readme by @growlix in #98
- Add minimum
mosaicml-streaming
version by @hanlint in #110 - Update dataloader.py by @nelsontkq in #102
- Add features to hf_generate by @alextrott16 in #116
- Make mpt7b finetuning more obvious by @samhavens in #101
- Fix(finetune yaml): fix parameters in mpt-7b_dolly_sft.yaml by @alanxmay in #131
- Fix HF conversion script to upload to S3 after editing the files to be HF compatible by @dakinggg in #136
- Set pad_token_id to tokenizer.pad_token_id if not set on command line by @patrickhwood in #118
- Changed the keep_zip default to False to comply with StreamingDataset by @karan6181 in #150
- Add cloud upload to checkpoint conversion script by @dakinggg in #151
- Adds precision to eval by @mvpatel2000 in #148
- Update StreamingDataset defaults by @abhi-mosaic in #157
- Explain
composer
command by @hanlint in #164 - Remove
pynvml
by @hanlint in #165 - Adds a concrete finetuning example from a custom dataset by @alextrott16 in #156
- Remove health checker by @mvpatel2000 in #167
- Rename datasets to avoid hf conflict by @hanlint in #175
- Torch2 (#177) by @vchiley in #178
- Revert "Torch2 (#177) (#178)" by @dakinggg in #181
- clean up dataset conversion readme by @codestar12 in #168
- Convert MPT checkpoints to FT format by @dskhudia in #169
- Update README.md by @jacobfulano in #198
- Removed unused
tokenizer_name
config field by @dakinggg in #206 - Add community links to README by @hanlint in #182
- Add Tensorboard logger to yaml config by @hanlint in #166
- Update inference README by @abhi-mosaic in #204
- torch2 updt with hf fixes by @vchiley in #193
- Removing deprecated vocabulary size parameter from composer CE metrics by @sashaDoubov in #222
- Add
composer[libcloud]
dependency by @abhi-mosaic in #218 - Use $RUN_NAME rather than $COMPOSER_RUN_NAME by @abhi-mosaic in #209
- Fixing benchmark mcli example with proper path and image by @sashaDoubov in #219
- Update README.md - Slack Link by @ejyuen in #207
- Kv cache speed by @vchiley in #210
- Fix a race condition in ICL eval by @dakinggg in #235
- Add basic issue templates by @dakinggg in #252
- Add a script to run mpt with FasterTransformer by @dskhudia in #229
- Change mcli eval YAMLs to use
mixed_precision: FULL
by @abhi-mosaic in #255 - Bump Xentropy Version by @nik-mosaic in #261
- updt tritonpremlir to sm90 version by @vchiley in #260
- Add
mosaicml/llm-foundry
Docker workflow by @abhi-mosaic in #254 - Patch README for better visibility by @abhi-mosaic in #267
- Add support for
device_map
by @abhi-mosaic in #225 - Fix model init when using 1 GPU by @abhi-mosaic in #269
- Update README.md by @abhi-mosaic in #268
- Update mcp_pytest.py by @mvpatel2000 in #274
- Fastertransformer: replace config['mpt'] with config['gpt'] by @dwyatte in #272
- Add
device_map
support forhf_generate.py
andhf_chat.py
by @abhi-mosaic in #276 - Add shift_labels arg to HF wrappers by @dakinggg in #288
- Update README.md by @abhi-mosaic in #294
- Small formatting fix in eval README by @sashaDoubov in #285
- Default to debug level debug by @samhavens in #299
- Sam/chat v2 by @samhavens in #296
- Add
save_weights_only
as an option by @dakinggg in #301 - Adding Custom Embedding, Enabling us to initialize on Heterogeneous Devices by @bcui19 in #298
- Fix convert_dataset_hf.py hanging with excessive num_workers by @casperbh96 in #270
- Update README.md by @jacobfulano in #300
- Fix autocast dtype by @abhi-mosaic in #302
- Set eval shuffle to False by @eldarkurtic in #297
- Huggingface Mixed Initialization by @bcui19 in #303
- Added new community tutorial on MPT-7B-Instruct Fine Tuning by @VRSEN in #311
- Fix generate callback to work with precision context by @dakinggg in #322
- Allow MPT past the tied word embeddings error by @dakinggg in #323
- Refresh Mosaicml platform yamls by @aspfohl in #208
- hard set bias(alibi) precision by @vchiley in #329
- Create tasks_light.yaml by @jfrankle in #335
- Attn amp by @vchiley in #337
- Load on rank 0 only flag by @mvpatel2000 in #334
- Add mixed device by @mvpatel2000 in #342
- Better error messages for ckpt conversion script by @dskhudia in #320
- Add script to update hub code from foundry by @dakinggg in #338
- Upgrade to
mosaicml-streaming==0.5.x
by @abhi-mosaic in #292 - updt composer to 0.15.0 by @vchiley in #347
- updt yml by @vchiley in #349
- Fix bug with saving optimizer states with MonolithicCheckpointSaver Callback by @eracah in #310
- Add step to free up some disk space on the worker by @bandish-shah in #350
- Filter out sequences where prompt is longer than max length, rather than dropping them on the fly later by @dakinggg in #348
- Revert "Filter out sequences where prompt is longer than max length, rather than dropping them on the fly later" by @codestar12 in #354
- Remote JSONL IFT data by @samhavens in #275
- Add MPT-30B to README by @abhi-mosaic in #356
- Codeql on PRs by @mvpatel2000 in #352
- Add secrets check as part of pre-commit by @karan6181 in #360
- Onboarding tutorial and related improvements by @alextrott16 in #205
- fixed rmsnorm bug. Changed division to multiply since using torch.rsqrt by @vancoykendall in #372
- Adds max seq len filter before finetuning ds by @vchiley in #359
- Feature/peft compatible models by @danbider in #346
- Fix Typing (part 1) by @hanlint in #240
- improve hf_chat UI and readme by @samhavens in #351
- Update onnx by @vchiley in #385
- Model gauntlet by @bmosaicml in #308
- Add 30b IFT example yaml by @samhavens in #388
- Add benchmarks to inference README by @sashaDoubov in #393
- updt install instructions by @vchiley in #396
- update quickstart eval task by @vchiley in #395
- Correct small typo in README.md by @jacobfulano in #391
- make peft installs a extra_dep by @vchiley in #397
- add fn to clear tests after every test by @vchiley in #400
- propagate cache_limit in streaming ds by @vchiley in #402
- Fixing hf_generate bug to account for pre-tokenization by @ksreenivasan in #387
- Eval Quickstart by @samhavens in #398
- Clean up train README by @jacobfulano in #392
- Fix/bugbash002 by @danbider in #405
- add install for AMD beta support by @vchiley in #407
- updt dtype of causal mask by @vchiley in #408
- YAMLS for MPT runs inherit global max_seq_len in model config by @alextrott16 in #409
- Update mcli-hf-eval.yaml by @samhavens in #411
- Edit tutorial comments on PEFT / LoRA by @vchiley in #416
- rm peft from pypi package by @vchiley in #420
- Update tasks_light.yaml by @jfrankle in #422
New Contributors
- @nelsontkq made their first contribution in #102
- @samhavens made their first contribution in #101
- @alanxmay made their first contribution in #131
- @patrickhwood made their first contribution in #118
- @karan6181 made their first contribution in #150
- @dskhudia made their first contribution in #169
- @nik-mosaic made their first contribution in #261
- @dwyatte made their first contribution in #272
- @casperbh96 made their first contribution in #270
- @eldarkurtic made their first contribution in #297
- @VRSEN made their first contribution in #311
- @aspfohl made their first contribution in #208
- @jfrankle made their first contribution in #335
- @eracah made their first contribution in #310
- @bandish-shah made their first contribution in #350
- @vancoykendall made their first contribution in #372
- @danbider made their first contribution in #346
- @ksreenivasan made their first contribution in #387
Full Changelog: v0.1.1...v0.2.0