🚀 LLM Foundry v0.6.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.

In addition to the usual bug fixes and performance improvements, we've added lots of new features!

New Features

Configurable loss for chat-formatted data (#985)

For chat-formatted data, you can now specify which tokens should be loss-generating in a configurable way.

This can be specified in the train_loader.dataset section of your yaml as follows:

...
train_loader:
  dataset:
    ...
    target_prompts: <FILL IN>
    target_reseponses: <FILL IN>

See the docstring for a description of the options.

Olmo support (#1016)

We've added support for the OLMo model from AI2.

To use OLMo, there are a few configuration parameters you need to set. First of all, you will need to install LLM Foundry with the extra package for OLMo (pip install .[gpu,olmo]).

Then you will need to adjust the tokenizer section of your config as follows:

tokenizer:
  name: allenai/OLMo-7B
  kwargs:
    revision: main
    model_max_length: 2048
    model_input_names:
    - input_ids
    - attention_mask
    trust_remote_code: true

Token accuracy (#983)

We've added a new, on-by-default metric to compute token accuracy in addition to cross entropy and perplexity.

Configurable activation checkpointing (#951)

More configurable activation checkpointing for MPT allows finer granular control over memory usage when training MPT. See the docstring for more details.

Finetuning with multiple streams, and pretokenized data (#933, #945, #946)

We've brought the finetuning dataloader up to speed with the pretraining dataloader to support mixing multiple streams, and pretokenizing finetuning data. See the yaml for a full example.

Eval Gauntlet v0.3 (#824)

We've release v0.3 of our Evaluation gauntlet. See the README for a full description.

Breaking changes and deprecations

Flash attention v1 removal (#1023)

Support for flash attention v1 has now been removed.

Extra BOS token removed (#1003)

When tokenizing prompt/response and chat data, for some tokenizers, we were mistakenly adding an extra BOS token between the prompt and the response. This has now been removed.

Deprecation of triton flash attention, prefixLM, and text denoising (#1007)

We've deprecated use of the triton version of flash attention, prefixLM, and text denoising, as these features were not heavily used or actively maintained.

What's Changed

Gauntlet v0.3: Fix chain-of-thought tasks by @bmosaicml in #824
Add finetuning streaming dataset conversion by @bigning in #933
Add default signature to mlflow saved model by @dakinggg in #952
allow te to use meta device with deferred init by @cli99 in #958
Update TUTORIAL.md by @sdonoso in #957
Update mcli yamls to use v0.5.0 by @irenedea in #959
add finutuning with streaming dataset example by @bigning in #945
Add fully configurable activation checkpointing by @cli99 in #951
Use create_model_version instead of register_model by @dakinggg in #953
Add streams support by @bigning in #946
Fix typo by @irenedea in #966
Fix eval.py with lora by @dakinggg in #965
add memory snapshot to callbacks by @cli99 in #810
Adding curriculum learning callback (experimental) by @snarayan21 in #954
strengthened chat formatting validation by @milocress in #960
Add new base images and remove fa1 images by @dakinggg in #970
Add new ICL kwargs in eval.py and long_context yamls by @maxisawesome in #925
Make composer pins consistent with each other by @dakinggg in #972
Make turbo an optional dependency by @snarayan21 in #964
Fix fewshot_random_seed default setting by @maxisawesome in #974
Improve error msg when checking target_blocks overlap by @cli99 in #977
Torch 2.2 upgrade - Part 1 by @dakinggg in #976
Torch 2.2 - Part 2 by @dakinggg in #979
PyTorch 2.2 - Part 3 by @dakinggg in #981
Remove torch 2.1 from docker workflow by @dakinggg in #982
Async callback: Don't skip checkpoints, reliably only launch async eval when the checkpoint is ready by @aspfohl in #813
Token accuracy metrics by @dakinggg in #983
Update readme to not mention 1.13_cu117 by @irenedea in #988
Patch test, lock mcli version by @aspfohl in #990
Bump gha timeouts by @aspfohl in #991
Fix readme typo by @dakinggg in #993
if condition in tie weights added by @megha95 in #989
Bump Composer to 0.20 by @dakinggg in #995
Trim examples ahead of time for auto packing by @irenedea in #994
add oom observer callback by @cli99 in #932
Use ci-testing repo for tests by @b-chu in #1000
Make CodeEval respect device_eval_batch_size by @josejg in #956
Remove try except around imports by @dakinggg in #1004
Deprecate triton, prefix lm, llama attention patch, and text denoising; Make ComposerHFT5 experimental by @irenedea in #1007
add magic filename for sharded state dicts by @milocress in #1001
Bump CI/CD to v3 by @mvpatel2000 in #1009
Fix evaluators actually pulling eval metrics by @mvpatel2000 in #1006
Build torch 2.2.1 images by @dakinggg in #1010
Add torch 2.2.1 tests by @dakinggg in #1011
Bump min torch pin to 2.2.1 by @dakinggg in #1013
Fix extra BOS token in front of response for some tokenizers by @dakinggg in #1003
Bump min composer pin by @dakinggg in #1015
Add default for eval interval by @irenedea in #987
Add support for olmo by @dakinggg in #1016
Add deeper support for multi-turn chats and loss-generating tokens in finetuning by @alextrott16 in #985
Add explicit packing ratio of 1 for profiling by @irenedea in #1019
Bump transformers to 4.38.2 by @dakinggg in #1018
Making sure MemoryMonitor takes in kwargs. by @snarayan21 in #1020
Update readme for torch version 2.2.1 by @irenedea in #1021
Add code import to train/eval scripts by @dakinggg in #1002
Bump version in readme by @bmosaicml in #1022
Bump version to 0.6.0 by @dakinggg in #1023

New Contributors

@bigning made their first contribution in #933
@sdonoso made their first contribution in #957
@josejg made their first contribution in #956

Full Changelog: v0.5.0...v0.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.0