Releases: mosaicml/llm-foundry
v0.10.0
🚀 LLM Foundry v0.10.0
New Features
Registry for ICL datasets (#1252)
ICL datasets have now been added as a registry.
Curriculum Learning Callback (#1256)
You can now switch dataloaders while training which enables curriculum learning.
train_loader:
<dataloader parameters>
callback:
curriculum_learning:
- duration: <number>tok
train_loader: # matches top level train_loader
<dataloader parameters>
- duration: <number>tok
train_loader:
<dataloader parameters>
- duration: <number>tok
train_loader:
<dataloader parameters>
[Experimental] Interweave Attention Layers (#1299)
You can now override default block configs for certain layers, allowing for different sliding window sizes, reusing the previous layer's kv cache, etc.
model:
...
(usual model configs)
...
block_overrides:
order:
- name: default
- order:
- name: sliding_window_layer
- name: sliding_window_layer_reuse
- name: sliding_window_layer
- repeat: 2
name: sliding_window_layer_reuse
- name: reuse_kv_layer
repeat: 2
overrides:
sliding_window_layer:
attn_config:
sliding_window_size: 1024
sliding_window_layer_reuse:
attn_config:
sliding_window_size: 1024
reuse_kv_layer_idx: -1 # Relative index of the layer whose kv cache to reuse
reuse_kv_layer:
attn_config:
reuse_kv_layer_idx: -6 # Relative index of the layer whose kv cache to reuse
Bug fixes
What's Changed
- Bump Version to 0.10.0.dev0 by @KuuCi in #1255
- Fix typo in setup.py by @XiaohanZhangCMU in #1263
- Update TE Dockerfile by @j316chuck in #1265
- Revert "Update TE Dockerfile (#1265)" by @j316chuck in #1266
- Revert to older TE version by @mvpatel2000 in #1267
- Bump Composer to version 0.23.2 by @dakinggg in #1269
- fix linting by @milocress in #1270
- Add torch 2.3.1 docker images by @dakinggg in #1275
- Make expandable segments on by default by @b-chu in #1278
- Adds CI for torch 2.3.1 by @dakinggg in #1281
- Update README.md to use variables by @milocress in #1282
- Add registry for ICL datasets by @sanjari-orb in #1252
- Fix typo in CI by @dakinggg in #1284
- Fix backwards compatibility for ICL arg by @dakinggg in #1286
- Fix packing + streaming + resumption by @dakinggg in #1277
- Dbfs HF by @KuuCi in #1214
- Bump mlflow to 2.13.2 by @KuuCi in #1285
- Add missing dependency group by @dakinggg in #1287
- Update Dockerfile with TE main by @j316chuck in #1273
- Fix TE HF checkpoint saving by @j316chuck in #1280
- added systemMetricsMonitor callback by @JackZ-db in #1260
- Extendability refactors by @dakinggg in #1290
- Small refactor for update batch size by @dakinggg in #1293
- Bump min composer version to 0.23.3 by @dakinggg in #1294
- Fix grad accum typing by @dakinggg in #1296
- Bump composer to 0.23.4 by @mvpatel2000 in #1297
- Allow passing in lbl_process_group directly by @dakinggg in #1298
- Add
all
transforms to train script by @dakinggg in #1300 - Add Retries to run_query by @KuuCi in #1302
- Bumping mlflow version to include buffering by @JackZ-db in #1303
- Ignore mosaicml logger for exception if excephook is active by @jjanezhang in #1301
- Add curriculum learning callback by @b-chu in #1256
- Avoid circular import in hf checkpointer by @dakinggg in #1304
- Remove codeql workflow by @dakinggg in #1305
- Update CI test to v0.0.8 by @KuuCi in #1306
- Upgrade ci testing to 0.0.8 by @dakinggg in #1307
- Bump ci-testing to 0.0.9 by @dakinggg in #1310
- Fix 4 gpu tests by @dakinggg in #1311
- Bump recommended images to 2.3.1 and remove 2.3.0 CI by @dakinggg in #1312
- Provide default seed value in TrainConfig, matching EvalConfig by @mvpatel2000 in #1315
- Refactor hf checkpointer for config transformations by @irenedea in #1318
- Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. by @ShashankMosaicML in #1299
- Add optional logging of text output to EvalOutputLogging by @sjawhar in #1283
New Contributors
- @sanjari-orb made their first contribution in #1252
- @JackZ-db made their first contribution in #1260
- @sjawhar made their first contribution in #1283
Full Changelog: v0.9.1...v0.10.0
v0.9.1
🚀 LLM Foundry v0.9.1
This is a minor patch release to bump the minimum version of mlflow to make sure to buffer writes (mosaicml/composer#3401)
Whats changed
Full Changelog: v0.9.0...v0.9.1
v0.9.0
🚀 LLM Foundry v0.9.0
New Features
More Token Encoding Types (#1254)
We've expanded the different ways to encode token IDs by allowing uint32 and uint16 formats, which saves significant space for datasets with smaller vocab sizes. We also extended ndarray type support for MDS dataset columns to the generic text dataset and updated conversion scripts accordingly.
Enforced Stricter Configs (#1254, #1225, #1202)
We've implemented stricter enforcement on our Train and Eval configs to further protect users from attempting to train with invalid configs. In conjunction with numerous other PRs, we have stronger error handling to help users use LLM Foundry smoothly.
Previously, this was allowed:
parameters:
train_dataloader:
...
seed: ${global_seed}
random_other_key_that's_not_in_the_dataloader_constructor # this is not allowed
...
global_seed: 17 # this is also not allowed
But we've added a variables section. Please do this instead:
parameters:
variables:
global_seed: 42
...
train_dataloader:
seed: ${variables.global_seed}
Chunked text to mds conversion (#1240)
We've updated our text to mds to convertion script to convert files to MDS in chunks. This protects us from loading entire large files at once (potentially causing OOMs), and drastically speeds up converting long sequences.
Breaking Changes and Deprecations
What's Changed
- Bump version v0.9.0.dev0 by @milocress in #1181
- structuredconfig for train.py and eval.py by @milocress in #1051
- update version names by @milocress in #1185
- Refactoring attention by @ShashankMosaicML in #1182
- Checking if attention mask is present for ignoring pad tokens in ffn. by @ShashankMosaicML in #1188
- Bump python 3.11 version in setup.py by @j316chuck in #1189
- Docstring fix for curriculum learning callback by @snarayan21 in #1186
- Set ft dataloader name explicitly by @milocress in #1187
- Remove to_container by @dakinggg in #1190
- fix eval by @milocress in #1193
- Log exception on inactivity callback by @jjanezhang in #1194
- Pass FC type along for all FFN types by @dakinggg in #1196
- Streaming version bump to 0.7.6 by @snarayan21 in #1195
- Clearer error message for unknown example type by @milocress in #1202
- Added torch_dmoe defaults, bug fixes for 2D inputs by @snarayan21 in #1210
- log eval dataset misconfiguration by @milocress in #1179
- Using self.shift_labels instead of self.model.transformer.shift_label in the loss function. by @ShashankMosaicML in #1211
- Add fc to HF export by @dakinggg in #1209
- TransformerEngine Image Build by @mvpatel2000 in #1204
- Removed debugging code in tests by @dakinggg in #1213
- Make
fc_type
a dict to pass fc kwargs through by @snarayan21 in #1201 - Fix dmoe tests GPU OOM by @snarayan21 in #1216
- Update readme to clarify flash-attn and TE installs by @snarayan21 in #1219
- Modularize components of megablocks layer builder by @dakinggg in #1224
- Add user error superclass by @milocress in #1225
- Make config/class properties on ComposerMPTForCausalLM by @dakinggg in #1227
- Quick patch to check that Dataset Keys contain non-None Values by @KuuCi in #1228
- Modularize backbone class and block creation by @dakinggg in #1229
- Loss v len callback by @ShashankMosaicML in #1226
- Fixing the state.timestamp.batch.value issue in loss v len callback by @ShashankMosaicML in #1232
- Fix attr error for attention_classes when using act ckpt by @cli99 in #1230
- Fix tuple typing by @dakinggg in #1235
- Add example eval scripts for dbrx PT sizes by @aspfohl in #1218
- Configurable submesh by @dakinggg in #1236
- Add retries to downloads in convert_text_to_mds.py by @irenedea in #1238
- Move MLFlow dataset outside of log_config by @KuuCi in #1234
- add error when chat template fails by @milocress in #1222
- Make the exceptions serializable by @dakinggg in #1239
- Removing rich install by @jjanezhang in #1198
- Chunk file reads and tokenization for text to mds conversion by @irenedea in #1240
- Make HF conversion automatically add missing imports by @dakinggg in #1241
- Add logging to convert_text_to_mds.py script by @irenedea in #1243
- Update CODEOWNERS by @dakinggg in #1248
- Replacing icl_task_type question_answering with generation_task_with_answers in long context eval yamls. by @ShashankMosaicML in #1250
- Change TE docker image to enable te_shard_weight by @j316chuck in #1251
- Fix MPT HF conversion by @dakinggg in #1257
- Remove spurious warning by @dakinggg in #1258
- Adding more token encoding types by @snarayan21 in #1254
- Bump Composer to 0.23.0 by @KuuCi in #1259
- Fix typo in setup.py by @XiaohanZhangCMU in #1263
- Bump composer to 0.23.2 by @dakinggg in #1269
Full Changelog: v0.8.0...v0.9.0
v0.8.0
🚀 LLM Foundry v0.8.0
New Features
Megablocks support (#1102)
Support for training optimized MoE models at large scale.
Check out the megablocks documentation for more information on building state of the art MoE models.
Expanded Registries (#1080, #1093, #1094, #1095, #1096, #1165)
We've expanded support for registries to include, dataloaders, FFN layers, attention layers, norms, and parameter initialization functions.
Check out the README for detailed instructions and code examples!
Support for ShareGPT chat format (#1098)
We now support the ShareGPT format for finetuning.
Breaking Changes and Deprecations
We have updated the minimum supported PyTorch version to torch 2.3 (#1152).
In Context Learning Code Evaluation (#1181)
We've removed the code_evaluation
task from the allowed in context learning task types, and we've deleted the InContextLearningCodeEvaluationDataset
and InContextLearningCodeEvalAccuracy
classes.
Question-Answering
We've removed the question_answering
task type. Please use the generation_task_with_answers
task instead.
What's Changed
- Update README by @hanlint in #1069
- Expose more exception attributes by @jjanezhang in #1071
- Output eval logging batch by @maxisawesome in #961
- Add expandeable segments flag by @dakinggg in #1075
- Check the user provided eos / bos token id against the tokenizer eos / bos token id by @ShashankMosaicML in #1039
- Triton RMSNorm by @josejg in #1050
- Fix tiktoken vocab size by @dakinggg in #1081
- Doing the loss reduction in foundry instead of in the loss functions. by @ShashankMosaicML in #1079
- Decrease log verbosity with no bias by @mvpatel2000 in #1082
- Upgrade hf chat by @j316chuck in #1061
- Fixes for streaming and auto packing by @dakinggg in #1083
- Background mlflow model registration by @irenedea in #1078
- Update README.md to include DBRX blog under "Latest News" by @lupesko in #1085
- Decrease transformers file size for mlflow by @dakinggg in #1087
- log packing ratio progress by @milocress in #1070
- Bump HF version by @b-chu in #1091
- Fix typo in expandable_segments by @mammothb in #1088
- Bump transformers to 4.39.3 by @dakinggg in #1086
- Fix yaml typo by @dakinggg in #1092
- Fix for overriding nested configs by @dakinggg in #1089
- cleaned up HF/MPT conversion test by @milocress in #1048
- Update yamls for 0.7.0 by @dakinggg in #1097
- Norms registry by @dakinggg in #1080
- fixing evaluator microbatch size by @ShashankMosaicML in #1100
- Updating the streaming version in setup.py by @ShashankMosaicML in #1103
- MegaBlocks release by @mvpatel2000 in #1102
- Remove torch compile from GLU by @josejg in #1101
- Update config_moe_args.py by @vchiley in #1104
- Add remote code option to allow execution of DBRX tokenizer by @b-chu in #1106
- Fix overwriting FP8 act ckpt flag in the train script by @cli99 in #1107
- Support ShareGPT chat format by @samhavens in #1098
- FC layer registry by @dakinggg in #1093
- Attention layer registry by @dakinggg in #1094
- Dbrx finetune yaml requires save folder specified to enable autoresume by @mvpatel2000 in #1108
- Revert "Update config_moe_args.py" by @vchiley in #1111
- rm new_group todo by @vchiley in #1112
- Migrate ICL classes to foundry by @bmosaicml in #936
- FFN layer registry by @dakinggg in #1095
- Param init registry by @dakinggg in #1096
- Add missing init file by @dakinggg in #1113
- Update tests to not rely on mistral by @dakinggg in #1117
- Bump transformers to 4.40 by @dakinggg in #1118
- add
.json
to SUPPORTED_EXTENSIONS by @eitanturok in #1114 - Add option for subclasses to convert model and tokenizer in hf checkpointer by @dakinggg in #1121
- Bump Composer to 0.21.3 by @b-chu in #1122
- catch misconfigured hf dataset by @milocress in #1123
- Pin mlflow by @dakinggg in #1124
- Change main to a dev version by @dakinggg in #1126
- Fix deprecation versions by @dakinggg in #1129
- Clean up the publicly exported API by @dakinggg in #1128
- Fix HF checkpointer + mlflow bugs by @dakinggg in #1125
- Update JSONL sources in eval README by @emmanuel-ferdman in #1110
- Mlflow datasets by @KuuCi in #1119
- Strict key checking for dataset by @b-chu in #1131
- First initialize dist with gloo by @dakinggg in #1133
- Fix saving of generation_config for Llama-3 by @eldarkurtic in #1134
- Bump datasets version by @dakinggg in #1138
- Revert "First initialize dist with gloo (#1133)" by @dakinggg in #1139
- Barrier immediately after initialize dist with logs by @dakinggg in #1140
- Add new FT instructions by @b-chu in #1143
- Upgrade ci-testing by @mvpatel2000 in #1145
- Fix typos in callbacks with configs by @dakinggg in #1146
- Remove olmo as a dependency by @snarayan21 in #1148
- build inner model by @milocress in #1147
- fix DatasetConstants.splints default value to protect dictionary overwriting by @ivan-kud in #1144
- Bump flash attention version by @dakinggg in #1150
- Torch 2.3 part 1 - build the images by @dakinggg in #1149
- Torch 2.3 upgrade Part 2 - CI by @dakinggg in #1151
- Comment out 2.3 tests by @dakinggg in #1155
- Fix yaml lint by @dakinggg in #1156
- Move sentencepiece import by @aspfohl in #1157
- Bump composer version to 0.22.0 by @snarayan21 in #1160
- Uncomment GPU tests by @milocress in #1162
- Depend on coverage by @milocress in #1163
- fix dep group in torch 2.3 ci by @dakinggg in #1164
- Bump min torch version to 2.3.0 by @dakinggg in #1152
- Add line splitting and other linting by @b-chu in #1161
- refactoring dataloader into registries. by @ShashankMosaicML in #1165
- Migrate eval output logging to foundry by @maxisawesome in #1166
- Fix import and mocking by @dakinggg in #1169
- minor fix to
llmfoundry.data.utils.get_text_collator
by @ShashankMosaicML in #1170 - Fix config access for DBRX by @dakinggg in #1177
New Contributors
v0.7.0
🚀 LLM Foundry v0.7.0
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.
In addition to the usual bug fixes and performance improvements, we've made foundry more customizable and extensible!
New Features
Registerable Components (#975, #1043, #1052, #1057)
We've made key components of LLM Foundry registrable, such as models, loggers, and callbacks. You can use the registry to easily customize and extend your training workflows.
This means that you can register new options for these components, and then use them in your yaml config.
Check out the README for detailed instructions and code examples!
Breaking Changes and Deprecations
Deprecated Feature Removals (#1063)
We've removed support for deprecated features: triton attention, Prefix LMs, Llama attention patch, z-loss, and text denoising. These features were little used, and we removed them to focus on the core features that are heavily used.
If you were using these features please let us know how you were using them in a GitHub issue. We're happy to add things back that are in heavy usage.
What's Changed
- Fix typo in monolithic chkpt callback docs by @sashaDoubov in #1024
- Allow code-quality workflow to be callable by @b-chu in #1026
- Fix llama attention patch by @dakinggg in #1036
- Adds a decorator for experimental features by @dakinggg in #1038
- Finish 0.6.0 release by @dakinggg in #1040
- Remove reference to attn_impl: triton by @dakinggg in #1041
- Registry based config - Part 1 by @dakinggg in #975
- Deprecate attention patching for llama by @dakinggg in #1047
- Compile GLU by @josejg in #1049
- log details to metadata for run analytics by @angel-ruiz7 in #992
- Update README.md by @dennyglee in #1056
- Add chat schema example for mlflow by @dakinggg in #1054
- Metrics registry by @dakinggg in #1052
- LLM Foundry CLI (just registry) by @dakinggg in #1043
- Bump Composer to 0.21.1 by @jjanezhang in #1053
- Dataloaders registry by @dakinggg in #1044
- Fix multi model eval by @dakinggg in #1055
- Remove unnecessary test workflow by @dakinggg in #1058
- Fix peft llama test by @dakinggg in #1059
- Models registry by @dakinggg in #1057
- Remove under construction from registry by @dakinggg in #1060
- Custom Exceptions for Mosaic Logger by @jjanezhang in #1014
- Bump version to 0.7.0 by @irenedea in #1063
- Fix file filter by @dakinggg in #1067
- Fix context printing by @irenedea in #1068
New Contributors
- @angel-ruiz7 made their first contribution in #992
- @dennyglee made their first contribution in #1056
Full Changelog: v0.6.0...v0.7.0
v0.6.0
🚀 LLM Foundry v0.6.0
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.
In addition to the usual bug fixes and performance improvements, we've added lots of new features!
New Features
Configurable loss for chat-formatted data (#985)
For chat-formatted data, you can now specify which tokens should be loss-generating in a configurable way.
This can be specified in the train_loader.dataset
section of your yaml as follows:
...
train_loader:
dataset:
...
target_prompts: <FILL IN>
target_reseponses: <FILL IN>
See the docstring for a description of the options.
Olmo support (#1016)
We've added support for the OLMo model from AI2.
To use OLMo, there are a few configuration parameters you need to set. First of all, you will need to install LLM Foundry with the extra package for OLMo (pip install .[gpu,olmo]
).
Then you will need to adjust the tokenizer section of your config as follows:
tokenizer:
name: allenai/OLMo-7B
kwargs:
revision: main
model_max_length: 2048
model_input_names:
- input_ids
- attention_mask
trust_remote_code: true
Token accuracy (#983)
We've added a new, on-by-default metric to compute token accuracy in addition to cross entropy and perplexity.
Configurable activation checkpointing (#951)
More configurable activation checkpointing for MPT allows finer granular control over memory usage when training MPT. See the docstring for more details.
Finetuning with multiple streams, and pretokenized data (#933, #945, #946)
We've brought the finetuning dataloader up to speed with the pretraining dataloader to support mixing multiple streams, and pretokenizing finetuning data. See the yaml for a full example.
Eval Gauntlet v0.3 (#824)
We've release v0.3 of our Evaluation gauntlet. See the README for a full description.
Breaking changes and deprecations
Flash attention v1 removal (#1023)
Support for flash attention v1 has now been removed.
Extra BOS token removed (#1003)
When tokenizing prompt/response and chat data, for some tokenizers, we were mistakenly adding an extra BOS token between the prompt and the response. This has now been removed.
Deprecation of triton flash attention, prefixLM, and text denoising (#1007)
We've deprecated use of the triton version of flash attention, prefixLM, and text denoising, as these features were not heavily used or actively maintained.
What's Changed
- Gauntlet v0.3: Fix chain-of-thought tasks by @bmosaicml in #824
- Add finetuning streaming dataset conversion by @bigning in #933
- Add default signature to mlflow saved model by @dakinggg in #952
- allow te to use meta device with deferred init by @cli99 in #958
- Update TUTORIAL.md by @sdonoso in #957
- Update mcli yamls to use v0.5.0 by @irenedea in #959
- add finutuning with streaming dataset example by @bigning in #945
- Add fully configurable activation checkpointing by @cli99 in #951
- Use create_model_version instead of register_model by @dakinggg in #953
- Add streams support by @bigning in #946
- Fix typo by @irenedea in #966
- Fix eval.py with lora by @dakinggg in #965
- add memory snapshot to callbacks by @cli99 in #810
- Adding curriculum learning callback (experimental) by @snarayan21 in #954
- strengthened chat formatting validation by @milocress in #960
- Add new base images and remove fa1 images by @dakinggg in #970
- Add new ICL kwargs in eval.py and long_context yamls by @maxisawesome in #925
- Make composer pins consistent with each other by @dakinggg in #972
- Make turbo an optional dependency by @snarayan21 in #964
- Fix fewshot_random_seed default setting by @maxisawesome in #974
- Improve error msg when checking target_blocks overlap by @cli99 in #977
- Torch 2.2 upgrade - Part 1 by @dakinggg in #976
- Torch 2.2 - Part 2 by @dakinggg in #979
- PyTorch 2.2 - Part 3 by @dakinggg in #981
- Remove torch 2.1 from docker workflow by @dakinggg in #982
- Async callback: Don't skip checkpoints, reliably only launch async eval when the checkpoint is ready by @aspfohl in #813
- Token accuracy metrics by @dakinggg in #983
- Update readme to not mention 1.13_cu117 by @irenedea in #988
- Patch test, lock mcli version by @aspfohl in #990
- Bump gha timeouts by @aspfohl in #991
- Fix readme typo by @dakinggg in #993
- if condition in tie weights added by @megha95 in #989
- Bump Composer to 0.20 by @dakinggg in #995
- Trim examples ahead of time for auto packing by @irenedea in #994
- add oom observer callback by @cli99 in #932
- Use ci-testing repo for tests by @b-chu in #1000
- Make CodeEval respect device_eval_batch_size by @josejg in #956
- Remove try except around imports by @dakinggg in #1004
- Deprecate triton, prefix lm, llama attention patch, and text denoising; Make ComposerHFT5 experimental by @irenedea in #1007
- add magic filename for sharded state dicts by @milocress in #1001
- Bump CI/CD to v3 by @mvpatel2000 in #1009
- Fix evaluators actually pulling eval metrics by @mvpatel2000 in #1006
- Build torch 2.2.1 images by @dakinggg in #1010
- Add torch 2.2.1 tests by @dakinggg in #1011
- Bump min torch pin to 2.2.1 by @dakinggg in #1013
- Fix extra BOS token in front of response for some tokenizers by @dakinggg in #1003
- Bump min composer pin by @dakinggg in #1015
- Add default for eval interval by @irenedea in #987
- Add support for olmo by @dakinggg in #1016
- Add deeper support for multi-turn chats and loss-generating tokens in finetuning by @alextrott16 in #985
- Add explicit packing ratio of 1 for profiling by @irenedea in #1019
- Bump transformers to 4.38.2 by @dakinggg in #1018
- Making sure
MemoryMonitor
takes in kwargs. by @snarayan21 in #1020 - Update readme for torch version 2.2.1 by @irenedea in #1021
- Add code import to train/eval scripts by @dakinggg in #1002
- Bump version in readme by @bmosaicml in #1022
- Bump version to 0.6.0 by @dakinggg in #1023
New Contributors
- @bigning made their first contribution in #933
- @sdonoso made their first contribution in #957
- @josejg made their first contribution in #956
Full Changelog: v0.5.0...v0.6.0
v0.5.0
🚀 LLM Foundry v0.5.0
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.
In addition to the usual bug fixes and performance improvements, we've added lots of new features!
New Features
LoRA Support (with FSDP!) (#886)
LLM Foundry now supports LoRA via an integration with the PEFT library. Within LLM Foundry, run train.py
, adding peft_config arguments to the model section of the config .yaml
, like so:
model:
...
peft_config:
r: 16
peft_type: LORA
task_type: CAUSAL_LM
lora_alpha: 32
lora_dropout: 0.05
target_modules:
- q_proj
- k_proj
Read more about it in the tutorial.
ALiBi for Flash Attention (#820)
We've added support for using ALiBi with Flash Attention (v2.4.2 or higher).
model:
...
attn_config:
attn_impl: flash
alibi: True
Chat Data for Finetuning (#884)
We now support finetuning on chat data, with automatic formatting applied using Hugging Face tokenizer chat templates.
Each sample requires a single key "messages"
that maps to an array of message objects. Each message object in the array represents a single message in the conversation and must contain the following keys:
role
: A string indicating the author of the message. Possible values are"system"
,"user"
, and"assistant"
.content
: A string containing the text of the message.
We require that there must be at least one message with the role "assistant", and the last message in the "messages" array must have the role "assistant" .
Here's an example .jsonl
with chat data:
{ "messages": [ { "role": "user", "content": "Hi, MPT!" }, { "role": "assistant", "content": "Hi, user!" } ]}
{ "messages": [
{ "role": "system": "A conversation between a user and a helpful and honest assistant"}
{ "role": "user", "content": "Hi, MPT!" },
{ "role": "assistant", "content": "Hi, user!" },
{ "role": "user", "content": "Is multi-turn chat supported?"},
{ "role": "assistant", "content": "Yes, we can chat for as long as my context length allows." }
]}
...
Safe Load for HuggingFace Datasets (#798)
We now provide a safe_load
option when loading HuggingFace datasets for finetuning.
This restricts loaded files to .jsonl
, .csv
, or .parquet
extensions to prevent arbitrary code execution.
To use, set safe_load
to true
in your dataset configuration:
train_loader:
name: finetuning
dataset:
safe_load: true
...
New PyTorch, Composer, Streaming, and Transformers versions
As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (mixtral in particular).
Deprecations
Support for Flash Attention v1 (#921)
Will be removed in v0.6.0.
Breaking Changes
Removed support for PyTorch versions before 2.1 (#787)
We no longer support PyTorch versions before 2.1.
Removed Deprecated Features (#948)
We've removed features that have been deprecated for at least one release.
What's Changed
- Small test fix to have right padding by @sashaDoubov in #757
- Release 040 back to main by @dakinggg in #758
- Bump composer version to 0.17.1 by @irenedea in #762
- Docker release on workflow_dispatch by @bandish-shah in #763
- Fix tiktoken wrapper by @dakinggg in #761
- enable param group configuration in llm-foundry by @vchiley in #760
- Add script for doing bulk generation against an endpoint by @aspfohl in #765
- Only strip object names when creating new output path by @irenedea in #766
- Add eval loader to eval script by @aspfohl in #742
- Support inputs_embeds by @samhavens in #687
- Better error message when test does not complete by @aspfohl in #769
- Add codeowners by @dakinggg in #770
- add single value support to activation_checkpointing_target by @cli99 in #772
- Reorganize tests to make them easier to find by @aspfohl in #768
- Add "completion" alias for response key by @dakinggg in #771
- Shashank/seq id flash attn by @ShashankMosaicML in #738
- Fix SIQA gold indices by @bmosaicml in #774
- Add missing load_weights_only to example yamls by @dakinggg in #776
- Patch flash attn in test to simulate environment without it installed by @dakinggg in #778
- Update .gitignore by @aspfohl in #781
- Disable mosaicml logger in foundry CI/CD by @mvpatel2000 in #788
- Chat fomating template changes by @rajammanabrolu in #784
- Remove tests and support for torch <2.1 by @dakinggg in #787
- Fix utf-8 decode errors in tiktoken wrapper by @dakinggg in #792
- Update gauntlet v0.2 to reflect results of calibration by @bmosaicml in #791
- Remove from mcli.sdk imports by @aspfohl in #793
- Auto packing fixes by @irenedea in #783
- Enable flag to not pass PAD tokens in ffwd by @bcui19 in #775
- Adding a fix for Cross Entropy Loss for long sequence lengths. by @ShashankMosaicML in #795
- Minor readme updates and bump min python version by @dakinggg in #799
- Enable GLU FFN type by @vchiley in #796
- clean up resolve_ffn_hidden_and_exp_ratio by @vchiley in #801
- Fix token counting to use attention mask instead of ids by @dakinggg in #802
- update openai wrapper to work with tiktoken interface and newest openai version by @bmosaicml in #794
- Fix openai not conditioned imports by @dakinggg in #806
- Make the ffn activation func configurable by @vchiley in #805
- Clean up the logs, bump datasets and transformers by @dakinggg in #804
- Fix remote path check for UC volumes by @irenedea in #809
- Expand options for MMLU. by @mansheej in #811
- Async eval callback by @aspfohl in #702
- Updating the Flash Attention version to fix cross entropy loss by @ShashankMosaicML in #812
- Remove redundant transposes for rope rotation by @ShashankMosaicML in #807
- Add generic flatten imports to HF checkpointer by @b-chu in #814
- Fix token counting to allow there to be no attention mask by @dakinggg in #818
- Default to using tokenizer eos and bos in convert_text_to_mds.py by @irenedea in #823
- Revert "Default to using tokenizer eos and bos in convert_text_to_mds.py" by @irenedea in #825
- Bump turbo version to 0.0.7 by @mvpatel2000 in #827
- Align GLU implementation with LLaMa by @vchiley in #829
- Use
sync_module_states: True
when using HSDP by @abhi-mosaic in #830 - Update composer to 0.17.2 and streaming to 0.7.2 by @irenedea in #822
- zero bias conversion corrected by @megha95 in #624
- Bump einops version, which has improved support for torch compile by @sashaDoubov in #832
- Update README with links to ML HW resources by @abhi-mosaic in #833
- Add safe_load option to restrict HF dataset downloads to allowed file types by @irenedea in #798
- Adding support for alibi when using flash attention by @ShashankMosaicML in #820
- Shashank/new benchmarks by @ShashankMosaicML in #838
- Fix error when decoding a token in the id gap (or out of range) in a tiktoken tokenizer by @dakinggg in #841
- Add use_tokenizer_eos option to convert text to mds script by @irene...
v0.4.0
🚀 LLM Foundry v0.4.0
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT-7B and MPT-30B models.
In addition to the usual bug fixes and performance improvements, we've added lots of new features!
New Features
Automatic sequence packing (#683)
You can now specify packing_ratio: auto
under your finetuning dataset, to automatically profile and select a good packing ratio to efficiently pack your sequences together on the fly during finetuning. This can dramatically reduce the amount of compute wasted on padding tokens.
Flash Attention 2 (#651, #666, #672)
We now support using Flash Attention 2 both in MPT and in any model that supports Flash Attention 2 via the Transformers library. See the training instructions to learn how to use the different versions of Flash Attention.
New PyTorch, Composer, Streaming, and Transformers versions (#648, #672, #736)
As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (codellama and mistral in particular).
Easy Databricks model deployment (#618)
We've made it much easier to go from a training run to a served model using Databricks model serving. To make use of this feature, you need to specify both an MLFlowLogger
and a HuggingFaceCheckpointer
for your run.
The MLFlowLogger
should have a Unity Catalog model registry prefix in the form of catalog.schema
. This specifies where to register your models to. For example,
loggers:
mlflow:
experiment_name: /Users/first.last@email.com/my_experiment_name
tracking_uri: databricks
model_registry_prefix: catalog.schema
model_registry_uri: databricks-uc
The HuggingFaceCheckpointer
should specify the name you want to register the model under. For example,
callbacks:
hf_checkpointer:
save_interval: 1ep # Save Hugging Face formatted checkpoints each epoch
save_folder: s3://bucket/path/to/my/checkpoints
mlflow_registered_model_name: my_model_name # Final model will be registered to catalog.schema.my_model_name
MPT model configurations
We've added a few new options when training with the MPT architecture in LLM Foundry.
- Rotary embeddings (#675)
- (Un)Tied word embeddings (#728)
- Fine grained activation checkpointing (#720)
Evaluation Improvements
We've released v0.1 of our Eval Gauntlet (#674, #748)! This adds many new benchmarks, chain-of-thought, and a new safety category. Check out the README for full details!
In addition, we've made a few improvements to our evaluation options, with more to come!
- Allow specifying multiple evaluation datasets to compute cross entropy and perplexity on during training (#603)
- Easier versions of the HumanEval dataset, which can be useful for comparing smaller models (#645)
- More options for averaging the results of the Eval Gauntlet (#640)
New pretraining benchmarks (#543)
Added H100 profiling results to our benchmarking table.
Quality of life improvements
- Improved Generate callback with more logging options. Use the
Generate
callback to log generations from your model over the course of training. (#631) - Count number of tokens during training excluding padding tokens. Previously this count included padding tokens. (#676)
- Use the PyTorch profiler to profile your training runs. (#678)
- A convenience script for using the much faster Hugging Face
snapshot_download
to download models from the Hugging Face Hub. (#708) - New AWS specific Docker images with LLM Foundry dependencies pre-installed. (#731)
Experimental features
Inverse square root learning rate scheduler (#657)
We've added experimental support for the inverse square root learning rate scheduler.
Breaking changes
Updated Streaming defaults (#723)
We've upgraded to the latest Streaming version, including vastly improved default settings for partitioning and shuffling. This means that if you were using the defaults, you will get different results after upgrading. The new defaults should be more performant for the large majority of use cases. See the Streaming release notes for more details.
Removed support for PrefixLM for Bloom and OPT models (#704)
We occasionally remove unused experimental parts of the code base to focus on new features and better support for existing features, and we've removed support for PrefixLM applied to Bloom and OPT models in this release.
What's Changed
- Multi eval dataset logging by @snarayan21 in #603
- Merge release 0.3.0 back to main by @dakinggg in #635
- Add tmp path retention policy by @j316chuck in #641
- Add flag to disable train metrics by @mvpatel2000 in #642
- Update pins to latest version that were missed by @dakinggg in #646
- Fix overriding of rope_scaling config by @dakinggg in #644
- Add 2.1 images to docker workflow and tests by @dakinggg in #648
- Fixes to lion8b test for torch 2.1 by @dakinggg in #649
- Only log "changing autoresume" when actually changing by @aspfohl in #653
- Fix lion8b error correction with torch 2.1 by @dblalock in #656
- Clean up processes between distributed gpu tests by @j316chuck in #660
- Revert "Clean up processes between distributed gpu tests (#660)" by @j316chuck in #662
- Switch ordering of foundry gpu tests by @j316chuck in #665
- Change batch size on coding tasks to 1 to avoid OOM by @bmosaicml in #654
- Add images with flash attention 2 by @dakinggg in #651
- Fix yaml change by @dakinggg in #667
- Revert actions change by @dakinggg in #668
- Inverse Square Root LR Schedule by @mansheej in #657
- Add test suite for flash attention 2 by @dakinggg in #666
- Adding Simplified Coding Tasks by @mcarbin in #645
- Fix typo in image name by @dakinggg in #669
- Point to composer.callback.Generate by @aspfohl in #631
- Do not update past_key_values in place by @irenedea in #652
- Fix small typos in the eval readme by @maxisawesome in #671
- Convert to DataSpec and add token counts that include padding by @dakinggg in #676
- Add support for automatically registering models to UC at the end of training by @dakinggg in #618
- add
load_strict_model_weights
as an optional config parameter by @AllenHW in #655 - Small changes to HF repo update script by @dakinggg in #680
- Add profiler support in llm foundry by @j316chuck in #678
- Update_pretrain_benchmarks by @crinard in #543
- add |--...
v0.3.0
🚀 LLM Foundry v0.3.0
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series. This release includes lots of bug fixes, stability improvements, and improved error messages, in addition to all the new features listed below!
Features
Llama-2 (#485, #520, #533)
Adds support for training Llama-2 models with optimized flash attention. To enable flash attention, set the attention_patch_type
in your yaml like so:
model:
...
attention_patch_type: triton
...
See the example yaml for a full example of how to finetune Llama-2 on the MosaicML platform.
8-bit Lion (#514)
We have implemented an 8-bit version of the Lion optimizer. This reduces the memory needed per parameter from 12 bits to 9 bits. To switch from Lion to 8-bit Lion, simply change the optimizer name from decoupled_lionw
to decoupled_lionw_8b
!
Checkpoint conversion (#526, #519, #594)
We've greatly improved our utilities for checkpoint conversion, including generalizing the Composer to Hugging Face conversion script to support all causal LMs, adding a callback to perform the conversion to Hugging Face format during the training job, and support for Faster Transformer conversion from a Composer MPT checkpoint.
To enable the new callback, add the hf_checkpointer
callback to your yaml like so:
callbacks:
...
hf_checkpointer:
# Save a Hugging Face formatted checkpoint at the end of each epoch
save_interval: 1ep
# The Hugging Face formatted checkpoints will be saved inside a subfolder called huggingface,
# so this folder will likely be the same as your overall save_folder
save_folder: ./{run_name}/checkpoints
# Set the precision you want the checkpoint saved in
precision: bfloat16
Code evaluation (#587)
We have added support for running HumanEval (code evaluation) using LLM Foundry! See the evaluation readme for a more detailed description and the tasks yaml for an ICL yaml that can be used to run the HumanEval evaluation task.
Transformer Engine support (#432)
Adds support for using NVIDIA's Transformer Enginer to enable FP8 training. To enable, set fc_type='te'
and/or ffn_config['ffn_type']='te_ln_mlp'
and precision='amp_fp8'
.
MLFlow (#475)
Adds support for using MLFlow as an experiment tracker. To enable, simply add mlflow
to the loggers
section of your yaml. See the Composer docs for more configuration options for MLFlow. Stay tuned for automatic model logging to MLFlow for easy deployment.
Updated streaming version/defaults (#503, #573, #580, #602)
Updates to the latest release of MosaicML Streaming and sets better defaults for improved shuffling quality and training throughput. Check out the Streaming release notes for the full details of all the new options!
Grouped Query Attention (#492)
Implements Grouped Query Attention, which can strike a good balance between the quality of Multi Head Attention and the speed of Multi Query Attention. To enable, set attn_config['attn_type']='grouped_query_attention'
and attn_config['kv_n_heads']
to the desired number of kv heads.
MPT quality of life improvements (#559, #599)
Thanks to @tdoublep and @lorabit110 for making MPT a bit easier to use with other parts of the NLP ecosystem!
Eval gauntlet during training, inference API eval wrapper (#501, #494)
Improvements to our evaluation setup, including the ability to run the eval gauntlet during training, and a wrapper to allow using inference APIs with our eval gauntlet. The ICL tasks and gauntlet can be specified as shown [here](https://github.com/mosaicml/llm-foundry/blob/fd36398dad5ac9fde085af679514189ce9439be4/scripts/eval/yamls/hf_eval.yaml#L46-L47.
tiktoken support (#610)
We have enabled training with tiktoken tokenizers with a thin wrapper around the tiktoken library for compatibility with all the tooling built around Hugging Face tokenizers. You can enable this with a simple change to the tokenizer section of your yaml:
tokenizer:
name: tiktoken
kwargs:
model_name: gpt-4
LoRA eval (#515)
Allows the use of our evaluation script with a model trained using LoRA. Coming soon, full support for LoRA with FSDP! See this yaml for an example of evaluating a model trained using LoRA. Stay tuned for full LoRA support with FSDP!
Finetuning API
Lastly, we are building a finetuning API on top of LLM Foundry, Composer, and Streaming. Please reach out if you might be interested in using this API as a customer!
What's Changed
- Release/v0.2.0 by @vchiley in #410
- Update README.md by @abhi-mosaic in #429
- Remove try catch in eval.py; make model_gauntlet optional in eval.py by @bmosaicml in #434
- Use torch.repeat instead of expand on key & value in Triton MQA to prevent NaNs with certain h_dims by @sashaDoubov in #442
- Update mcli-hf-generate.yaml by @vchiley in #456
- Add trust remote code for tokenizer in inference conversion script by @margaretqian in #446
- setup.py: replace composer with mosaicml by @guoyejun in #458
- Add linear layer and ffn config to enable TransformerEngine layers (with FP8) by @vchiley in #432
- use mono checkpoints by @samhavens in #448
- Update inference benchmark with recent HF changes by @sashaDoubov in #461
- Adding different device intialization to eval by @bcui19 in #466
- Fix missing import by @bcui19 in #470
- Autoresume default on by @mvpatel2000 in #467
- Support eval loader when finetuning from JSONL files in object stores by @samhavens in #469
- Fix ambiguous
throughput
in README by @abhi-mosaic in #476 - adds early stopping call back by @codestar12 in #488
- Update accelerate to 0.20.3 for LLaMa-2 support by @rishab-partha in #485
- If Alibi is on, we should turn learned_pos_emb to False by @bcui19 in #489
- Fix Local World Size by @rishab-partha in #495
- Increase lint CI timeout by @dakinggg in #498
- fix boolean for reentrant setting by @dakinggg in #500
- Adding pyright to pre-commit by @bcui19 in #477
- fix no bias assignment by @vchiley in #502
- Updated StreamingTextDataset to pass take in shuffle_block_size by @snarayan21 in #503
- Add MLFlow as a logger option by @aspfohl in #475
- Remove old optimizer logs by @mvpatel2000 in #509
- Updates GPU test timeout to use mcloud flag by @mvpatel2000 in #510
- Grouped Query Attention + Refactor Attn by @sashaDoubov in #492
- Fix training integration test by @j316chuck in #517
- Update max duration due to mcli api change by @mvpatel2000 in #523
- Fix typo in GQA comments by @sashaDoubov in https://github.com/mosaic...
v0.2.0
🚀 LLM Foundry v0.2.0
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLM). LLM Foundry serves as the efficient training codebase for the MPT-7B and MPT-30B models. Our emphasis is on efficiency, scalability, and ease-of-use, to enable fast iteration and prototyping.
We are excited to share the release of v0.2.0
, packed with support for new hardware, features, and tutorials.
📖 Tutorials
We have released new tutorial content and helper scripts for dataset preparation, pre-training, fine-tuning, and inference!
To start off, a basic walkthrough and answers to FAQs can be found in our Basic Tutorial.
Next, detailed guides for different workflows are linked below:
Training
In addition, for a more advanced and self-contained example of finetuning the MPT-7B model, see Finetune Example.
Inference
The inference tutorials cover several new features we've added that improve integration with HuggingFace and FasterTransformer libraries:
- Converting a Composer checkpoint to an HF checkpoint folder
- Interactive Generation with HF models
- Interactive Chat with HF models
- Converting an HF model to ONNX
- Converting an HF MPT to FasterTransformer
- Running MPT with FasterTransformer
Major Features
LLM Foundry now uses Composer v0.15.0
and Streaming v0.5.1
as minimum requirements. For more details, see their release notes for Composer and Streaming for all the improvements.
-
🆕 Torch 2.0 support
LLM Foundry is now Torch 2.0 compatible!
Note: we have not tested
torch.compile
, but do not expect significant performance improvements. -
⚡ H100 Support
We now support NVIDIA H100 systems! See our blog post on Benchmarking LLMs on H100 GPUs for initial performance and convergence details.
To run LLM Foundry with NVIDIA H100 systems, be sure to use a docker images that has CUDA 11.8+ and PyTorch 2.0+ versions.
For example,
mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04
from our dockerhub has been tested with NVIDIA H100 systems.No code changes should be required.
-
📈 AMD MI250 GPU Support
With the release of PyTorch 2.0 and ROCm 5.4+, excited to share that LLM training now works out of the box on AMD Datacenter GPUs! Read our blog post on Training LLMs with AMD MI250 GPUs for more details.
Running with our stack was straightforward: use the ROCm 5.4 docker image
rocm/dev-ubuntu-20.04:5.4.3-complete
; and install PyTorch for ROCm 5.4 and install Flash Attention.Modify your configuration settings:
attn_impl=flash
instead of the defaulttriton
- Note: ALiBi is currently not supported with
attn_impl=flash
.
- Note: ALiBi is currently not supported with
loss_fn=torch_crossentropy
instead of the defaultfused_crossentropy
.
-
🚧 LoRA finetuning (Preview)
We have included a preview release of Low Rank Adaptation (LoRA) support for memory-efficient fine-tuning of LLMs (Shen et al, 2021).
To use LoRA, follow the instructions found here.
Note: This is a preview feature, please let us know any feedback! The API and support is subject to change.
-
🔎 Evaluation Refactor (#308)
Our evaluation suite has been significantly refactored into our Model Gauntlet approach. This includes a number of breaking API changes to support multiple models:
- Instead of
model
, use themodels
keyword and provide a list of models. tokenizer
is now model-specific.
For example, to run the gauntlet of various eval tasks with
mosaicml/mpt-7b
:cd llm-foundry/scripts composer eval/eval.py eval/yamls/hf_eval.yaml model_name_or_path=mosaicml/mpt-7b
This release also makes evaluation deterministic even on different number of GPUs.
For more details on all these changes, see #308
- Instead of
-
⏱️ Benchmarking Inference
To better support the deployment of LLMs, we have included inference benchmarking suite and results across different hardware and other LLM models.
PR List
- hf dict cfg overrides by @vchiley in #90
- Add slack and license buttons to readme by @growlix in #98
- Add minimum
mosaicml-streaming
version by @hanlint in #110 - Update dataloader.py by @nelsontkq in #102
- Add features to hf_generate by @alextrott16 in #116
- Make mpt7b finetuning more obvious by @samhavens in #101
- Fix(finetune yaml): fix parameters in mpt-7b_dolly_sft.yaml by @alanxmay in #131
- Fix HF conversion script to upload to S3 after editing the files to be HF compatible by @dakinggg in #136
- Set pad_token_id to tokenizer.pad_token_id if not set on command line by @patrickhwood in #118
- Changed the keep_zip default to False to comply with StreamingDataset by @karan6181 in #150
- Add cloud upload to checkpoint conversion script by @dakinggg in #151
- Adds precision to eval by @mvpatel2000 in #148
- Update StreamingDataset defaults by @abhi-mosaic in #157
- Explain
composer
command by @hanlint in #164 - Remove
pynvml
by @hanlint in #165 - Adds a concrete finetuning example from a custom dataset by @alextrott16 in #156
- Remove health checker by @mvpatel2000 in #167
- Rename datasets to avoid hf conflict by @hanlint in #175
- Torch2 (#177) by @vchiley in #178
- Revert "Torch2 (#177) (#178)" by @dakinggg in #181
- clean up dataset conversion readme by @codestar12 in #168
- Convert MPT checkpoints to FT format by @dskhudia in #169
- Update README.md by @jacobfulano in #198
- Removed unused
tokenizer_name
config field by @dakinggg in #206 - Add community links to README by @hanlint in #182
- Add Tensorboard logger to yaml config by @hanlint in #166
- Update inference README by @abhi-mosaic in #204
- torch2 updt with hf fixes by @vchiley in #193
- Removing deprecated vocabulary size parameter from composer CE metrics by @sashaDoubov in #222
- Add `composer[...