🚀 LLM Foundry v0.2.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLM). LLM Foundry serves as the efficient training codebase for the MPT-7B and MPT-30B models. Our emphasis is on efficiency, scalability, and ease-of-use, to enable fast iteration and prototyping.

We are excited to share the release of v0.2.0, packed with support for new hardware, features, and tutorials.

📖 Tutorials

We have released new tutorial content and helper scripts for dataset preparation, pre-training, fine-tuning, and inference!

To start off, a basic walkthrough and answers to FAQs can be found in our Basic Tutorial.

Next, detailed guides for different workflows are linked below:

Training

In addition, for a more advanced and self-contained example of finetuning the MPT-7B model, see Finetune Example.

Inference

The inference tutorials cover several new features we've added that improve integration with HuggingFace and FasterTransformer libraries:

Major Features

LLM Foundry now uses Composer v0.15.0 and Streaming v0.5.1 as minimum requirements. For more details, see their release notes for Composer and Streaming for all the improvements.

⚠️ The new Streaming release includes a few API changes, see the Streaming v0.5 release notes for more details. Our API have also been changed to reflect these API modifications.

🆕 Torch 2.0 support

LLM Foundry is now Torch 2.0 compatible!

Note: we have not tested torch.compile, but do not expect significant performance improvements.
⚡ H100 Support

We now support NVIDIA H100 systems! See our blog post on Benchmarking LLMs on H100 GPUs for initial performance and convergence details.

To run LLM Foundry with NVIDIA H100 systems, be sure to use a docker images that has CUDA 11.8+ and PyTorch 2.0+ versions.

For example, mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04 from our dockerhub has been tested with NVIDIA H100 systems.

No code changes should be required.
📈 AMD MI250 GPU Support

With the release of PyTorch 2.0 and ROCm 5.4+, excited to share that LLM training now works out of the box on AMD Datacenter GPUs! Read our blog post on Training LLMs with AMD MI250 GPUs for more details.

Running with our stack was straightforward: use the ROCm 5.4 docker image rocm/dev-ubuntu-20.04:5.4.3-complete; and install PyTorch for ROCm 5.4 and install Flash Attention.

Modify your configuration settings:
- attn_impl=flash instead of the default triton
  - Note: ALiBi is currently not supported with attn_impl=flash.
- loss_fn=torch_crossentropy instead of the default fused_crossentropy.
🚧 LoRA finetuning (Preview)

We have included a preview release of Low Rank Adaptation (LoRA) support for memory-efficient fine-tuning of LLMs (Shen et al, 2021).

To use LoRA, follow the instructions found here.

Note: This is a preview feature, please let us know any feedback! The API and support is subject to change.
🔎 Evaluation Refactor (#308)

Our evaluation suite has been significantly refactored into our Model Gauntlet approach. This includes a number of breaking API changes to support multiple models:
- Instead of model, use the models keyword and provide a list of models.
- tokenizer is now model-specific.
For example, to run the gauntlet of various eval tasks with mosaicml/mpt-7b:
```
cd llm-foundry/scripts
composer eval/eval.py eval/yamls/hf_eval.yaml
    model_name_or_path=mosaicml/mpt-7b
```
This release also makes evaluation deterministic even on different number of GPUs.

For more details on all these changes, see #308
⏱️ Benchmarking Inference

To better support the deployment of LLMs, we have included inference benchmarking suite and results across different hardware and other LLM models.

PR List

hf dict cfg overrides by @vchiley in #90
Add slack and license buttons to readme by @growlix in #98
Add minimum mosaicml-streaming version by @hanlint in #110
Update dataloader.py by @nelsontkq in #102
Add features to hf_generate by @alextrott16 in #116
Make mpt7b finetuning more obvious by @samhavens in #101
Fix(finetune yaml): fix parameters in mpt-7b_dolly_sft.yaml by @alanxmay in #131
Fix HF conversion script to upload to S3 after editing the files to be HF compatible by @dakinggg in #136
Set pad_token_id to tokenizer.pad_token_id if not set on command line by @patrickhwood in #118
Changed the keep_zip default to False to comply with StreamingDataset by @karan6181 in #150
Add cloud upload to checkpoint conversion script by @dakinggg in #151
Adds precision to eval by @mvpatel2000 in #148
Update StreamingDataset defaults by @abhi-mosaic in #157
Explain composer command by @hanlint in #164
Remove pynvml by @hanlint in #165
Adds a concrete finetuning example from a custom dataset by @alextrott16 in #156
Remove health checker by @mvpatel2000 in #167
Rename datasets to avoid hf conflict by @hanlint in #175
Torch2 (#177) by @vchiley in #178
Revert "Torch2 (#177) (#178)" by @dakinggg in #181
clean up dataset conversion readme by @codestar12 in #168
Convert MPT checkpoints to FT format by @dskhudia in #169
Update README.md by @jacobfulano in #198
Removed unused tokenizer_name config field by @dakinggg in #206
Add community links to README by @hanlint in #182
Add Tensorboard logger to yaml config by @hanlint in #166
Update inference README by @abhi-mosaic in #204
torch2 updt with hf fixes by @vchiley in #193
Removing deprecated vocabulary size parameter from composer CE metrics by @sashaDoubov in #222
Add composer[libcloud] dependency by @abhi-mosaic in #218
Use $RUN_NAME rather than $COMPOSER_RUN_NAME by @abhi-mosaic in #209
Fixing benchmark mcli example with proper path and image by @sashaDoubov in #219
Update README.md - Slack Link by @ejyuen in #207
Kv cache speed by @vchiley in #210
Fix a race condition in ICL eval by @dakinggg in #235
Add basic issue templates by @dakinggg in #252
Add a script to run mpt with FasterTransformer by @dskhudia in #229
Change mcli eval YAMLs to use mixed_precision: FULL by @abhi-mosaic in #255
Bump Xentropy Version by @nik-mosaic in #261
updt tritonpremlir to sm90 version by @vchiley in #260
Add mosaicml/llm-foundry Docker workflow by @abhi-mosaic in #254
Patch README for better visibility by @abhi-mosaic in #267
Add support for device_map by @abhi-mosaic in #225
Fix model init when using 1 GPU by @abhi-mosaic in #269
Update README.md by @abhi-mosaic in #268
Update mcp_pytest.py by @mvpatel2000 in #274
Fastertransformer: replace config['mpt'] with config['gpt'] by @dwyatte in #272
Add device_map support for hf_generate.py and hf_chat.py by @abhi-mosaic in #276
Add shift_labels arg to HF wrappers by @dakinggg in #288
Update README.md by @abhi-mosaic in #294
Small formatting fix in eval README by @sashaDoubov in #285
Default to debug level debug by @samhavens in #299
Sam/chat v2 by @samhavens in #296
Add save_weights_only as an option by @dakinggg in #301
Adding Custom Embedding, Enabling us to initialize on Heterogeneous Devices by @bcui19 in #298
Fix convert_dataset_hf.py hanging with excessive num_workers by @casperbh96 in #270
Update README.md by @jacobfulano in #300
Fix autocast dtype by @abhi-mosaic in #302
Set eval shuffle to False by @eldarkurtic in #297
Huggingface Mixed Initialization by @bcui19 in #303
Added new community tutorial on MPT-7B-Instruct Fine Tuning by @VRSEN in #311
Fix generate callback to work with precision context by @dakinggg in #322
Allow MPT past the tied word embeddings error by @dakinggg in #323
Refresh Mosaicml platform yamls by @aspfohl in #208
hard set bias(alibi) precision by @vchiley in #329
Create tasks_light.yaml by @jfrankle in #335
Attn amp by @vchiley in #337
Load on rank 0 only flag by @mvpatel2000 in #334
Add mixed device by @mvpatel2000 in #342
Better error messages for ckpt conversion script by @dskhudia in #320
Add script to update hub code from foundry by @dakinggg in #338
Upgrade to mosaicml-streaming==0.5.x by @abhi-mosaic in #292
updt composer to 0.15.0 by @vchiley in #347
updt yml by @vchiley in #349
Fix bug with saving optimizer states with MonolithicCheckpointSaver Callback by @eracah in #310
Add step to free up some disk space on the worker by @bandish-shah in #350
Filter out sequences where prompt is longer than max length, rather than dropping them on the fly later by @dakinggg in #348
Revert "Filter out sequences where prompt is longer than max length, rather than dropping them on the fly later" by @codestar12 in #354
Remote JSONL IFT data by @samhavens in #275
Add MPT-30B to README by @abhi-mosaic in #356
Codeql on PRs by @mvpatel2000 in #352
Add secrets check as part of pre-commit by @karan6181 in #360
Onboarding tutorial and related improvements by @alextrott16 in #205
fixed rmsnorm bug. Changed division to multiply since using torch.rsqrt by @vancoykendall in #372
Adds max seq len filter before finetuning ds by @vchiley in #359
Feature/peft compatible models by @danbider in #346
Fix Typing (part 1) by @hanlint in #240
improve hf_chat UI and readme by @samhavens in #351
Update onnx by @vchiley in #385
Model gauntlet by @bmosaicml in #308
Add 30b IFT example yaml by @samhavens in #388
Add benchmarks to inference README by @sashaDoubov in #393
updt install instructions by @vchiley in #396
update quickstart eval task by @vchiley in #395
Correct small typo in README.md by @jacobfulano in #391
make peft installs a extra_dep by @vchiley in #397
add fn to clear tests after every test by @vchiley in #400
propagate cache_limit in streaming ds by @vchiley in #402
Fixing hf_generate bug to account for pre-tokenization by @ksreenivasan in #387
Eval Quickstart by @samhavens in #398
Clean up train README by @jacobfulano in #392
Fix/bugbash002 by @danbider in #405
add install for AMD beta support by @vchiley in #407
updt dtype of causal mask by @vchiley in #408
YAMLS for MPT runs inherit global max_seq_len in model config by @alextrott16 in #409
Update mcli-hf-eval.yaml by @samhavens in #411
Edit tutorial comments on PEFT / LoRA by @vchiley in #416
rm peft from pypi package by @vchiley in #420
Update tasks_light.yaml by @jfrankle in #422

New Contributors

@nelsontkq made their first contribution in #102
@samhavens made their first contribution in #101
@alanxmay made their first contribution in #131
@patrickhwood made their first contribution in #118
@karan6181 made their first contribution in #150
@dskhudia made their first contribution in #169
@nik-mosaic made their first contribution in #261
@dwyatte made their first contribution in #272
@casperbh96 made their first contribution in #270
@eldarkurtic made their first contribution in #297
@VRSEN made their first contribution in #311
@aspfohl made their first contribution in #208
@jfrankle made their first contribution in #335
@eracah made their first contribution in #310
@bandish-shah made their first contribution in #350
@vancoykendall made their first contribution in #372
@danbider made their first contribution in #346
@ksreenivasan made their first contribution in #387

Full Changelog: v0.1.1...v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0