v0.6.0
🚀 LLM Foundry v0.6.0
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.
In addition to the usual bug fixes and performance improvements, we've added lots of new features!
New Features
Configurable loss for chat-formatted data (#985)
For chat-formatted data, you can now specify which tokens should be loss-generating in a configurable way.
This can be specified in the train_loader.dataset
section of your yaml as follows:
...
train_loader:
dataset:
...
target_prompts: <FILL IN>
target_reseponses: <FILL IN>
See the docstring for a description of the options.
Olmo support (#1016)
We've added support for the OLMo model from AI2.
To use OLMo, there are a few configuration parameters you need to set. First of all, you will need to install LLM Foundry with the extra package for OLMo (pip install .[gpu,olmo]
).
Then you will need to adjust the tokenizer section of your config as follows:
tokenizer:
name: allenai/OLMo-7B
kwargs:
revision: main
model_max_length: 2048
model_input_names:
- input_ids
- attention_mask
trust_remote_code: true
Token accuracy (#983)
We've added a new, on-by-default metric to compute token accuracy in addition to cross entropy and perplexity.
Configurable activation checkpointing (#951)
More configurable activation checkpointing for MPT allows finer granular control over memory usage when training MPT. See the docstring for more details.
Finetuning with multiple streams, and pretokenized data (#933, #945, #946)
We've brought the finetuning dataloader up to speed with the pretraining dataloader to support mixing multiple streams, and pretokenizing finetuning data. See the yaml for a full example.
Eval Gauntlet v0.3 (#824)
We've release v0.3 of our Evaluation gauntlet. See the README for a full description.
Breaking changes and deprecations
Flash attention v1 removal (#1023)
Support for flash attention v1 has now been removed.
Extra BOS token removed (#1003)
When tokenizing prompt/response and chat data, for some tokenizers, we were mistakenly adding an extra BOS token between the prompt and the response. This has now been removed.
Deprecation of triton flash attention, prefixLM, and text denoising (#1007)
We've deprecated use of the triton version of flash attention, prefixLM, and text denoising, as these features were not heavily used or actively maintained.
What's Changed
- Gauntlet v0.3: Fix chain-of-thought tasks by @bmosaicml in #824
- Add finetuning streaming dataset conversion by @bigning in #933
- Add default signature to mlflow saved model by @dakinggg in #952
- allow te to use meta device with deferred init by @cli99 in #958
- Update TUTORIAL.md by @sdonoso in #957
- Update mcli yamls to use v0.5.0 by @irenedea in #959
- add finutuning with streaming dataset example by @bigning in #945
- Add fully configurable activation checkpointing by @cli99 in #951
- Use create_model_version instead of register_model by @dakinggg in #953
- Add streams support by @bigning in #946
- Fix typo by @irenedea in #966
- Fix eval.py with lora by @dakinggg in #965
- add memory snapshot to callbacks by @cli99 in #810
- Adding curriculum learning callback (experimental) by @snarayan21 in #954
- strengthened chat formatting validation by @milocress in #960
- Add new base images and remove fa1 images by @dakinggg in #970
- Add new ICL kwargs in eval.py and long_context yamls by @maxisawesome in #925
- Make composer pins consistent with each other by @dakinggg in #972
- Make turbo an optional dependency by @snarayan21 in #964
- Fix fewshot_random_seed default setting by @maxisawesome in #974
- Improve error msg when checking target_blocks overlap by @cli99 in #977
- Torch 2.2 upgrade - Part 1 by @dakinggg in #976
- Torch 2.2 - Part 2 by @dakinggg in #979
- PyTorch 2.2 - Part 3 by @dakinggg in #981
- Remove torch 2.1 from docker workflow by @dakinggg in #982
- Async callback: Don't skip checkpoints, reliably only launch async eval when the checkpoint is ready by @aspfohl in #813
- Token accuracy metrics by @dakinggg in #983
- Update readme to not mention 1.13_cu117 by @irenedea in #988
- Patch test, lock mcli version by @aspfohl in #990
- Bump gha timeouts by @aspfohl in #991
- Fix readme typo by @dakinggg in #993
- if condition in tie weights added by @megha95 in #989
- Bump Composer to 0.20 by @dakinggg in #995
- Trim examples ahead of time for auto packing by @irenedea in #994
- add oom observer callback by @cli99 in #932
- Use ci-testing repo for tests by @b-chu in #1000
- Make CodeEval respect device_eval_batch_size by @josejg in #956
- Remove try except around imports by @dakinggg in #1004
- Deprecate triton, prefix lm, llama attention patch, and text denoising; Make ComposerHFT5 experimental by @irenedea in #1007
- add magic filename for sharded state dicts by @milocress in #1001
- Bump CI/CD to v3 by @mvpatel2000 in #1009
- Fix evaluators actually pulling eval metrics by @mvpatel2000 in #1006
- Build torch 2.2.1 images by @dakinggg in #1010
- Add torch 2.2.1 tests by @dakinggg in #1011
- Bump min torch pin to 2.2.1 by @dakinggg in #1013
- Fix extra BOS token in front of response for some tokenizers by @dakinggg in #1003
- Bump min composer pin by @dakinggg in #1015
- Add default for eval interval by @irenedea in #987
- Add support for olmo by @dakinggg in #1016
- Add deeper support for multi-turn chats and loss-generating tokens in finetuning by @alextrott16 in #985
- Add explicit packing ratio of 1 for profiling by @irenedea in #1019
- Bump transformers to 4.38.2 by @dakinggg in #1018
- Making sure
MemoryMonitor
takes in kwargs. by @snarayan21 in #1020 - Update readme for torch version 2.2.1 by @irenedea in #1021
- Add code import to train/eval scripts by @dakinggg in #1002
- Bump version in readme by @bmosaicml in #1022
- Bump version to 0.6.0 by @dakinggg in #1023
New Contributors
- @bigning made their first contribution in #933
- @sdonoso made their first contribution in #957
- @josejg made their first contribution in #956
Full Changelog: v0.5.0...v0.6.0