🚀 LLM Foundry v0.5.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.

In addition to the usual bug fixes and performance improvements, we've added lots of new features!

New Features

LoRA Support (with FSDP!) (#886)

LLM Foundry now supports LoRA via an integration with the PEFT library. Within LLM Foundry, run train.py, adding peft_config arguments to the model section of the config .yaml, like so:

model:
  ...
  peft_config:
      r: 16
      peft_type: LORA
      task_type: CAUSAL_LM
      lora_alpha: 32
      lora_dropout: 0.05
      target_modules:
      - q_proj
      - k_proj

ALiBi for Flash Attention (#820)

We've added support for using ALiBi with Flash Attention (v2.4.2 or higher).

model:
     ...
     attn_config:
         attn_impl: flash
         alibi: True

Chat Data for Finetuning (#884)

We now support finetuning on chat data, with automatic formatting applied using Hugging Face tokenizer chat templates.

Each sample requires a single key "messages" that maps to an array of message objects. Each message object in the array represents a single message in the conversation and must contain the following keys:

role : A string indicating the author of the message. Possible values are "system" ,"user" , and "assistant" .
content : A string containing the text of the message.

We require that there must be at least one message with the role "assistant", and the last message in the "messages" array must have the role "assistant" .

Here's an example .jsonl with chat data:


{ "messages": [ { "role": "user", "content": "Hi, MPT!" }, { "role": "assistant", "content": "Hi, user!" } ]}
{ "messages": [ 
  { "role": "system": "A conversation between a user and a helpful and honest assistant"}
  { "role": "user", "content": "Hi, MPT!" }, 
  { "role": "assistant", "content": "Hi, user!" },
  { "role": "user", "content": "Is multi-turn chat supported?"},
  { "role": "assistant", "content": "Yes, we can chat for as long as my context length allows." }
]}
...

Safe Load for HuggingFace Datasets (#798)

We now provide a safe_load option when loading HuggingFace datasets for finetuning.

This restricts loaded files to .jsonl, .csv, or .parquet extensions to prevent arbitrary code execution.

To use, set safe_load to true in your dataset configuration:

  train_loader:
    name: finetuning
    dataset:
      safe_load: true
      ...

New PyTorch, Composer, Streaming, and Transformers versions

As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (mixtral in particular).

Deprecations

Support for Flash Attention v1 (#921)

Will be removed in v0.6.0.

Breaking Changes

Removed support for PyTorch versions before 2.1 (#787)

We no longer support PyTorch versions before 2.1.

Removed Deprecated Features (#948)

We've removed features that have been deprecated for at least one release.

What's Changed

Small test fix to have right padding by @sashaDoubov in #757
Release 040 back to main by @dakinggg in #758
Bump composer version to 0.17.1 by @irenedea in #762
Docker release on workflow_dispatch by @bandish-shah in #763
Fix tiktoken wrapper by @dakinggg in #761
enable param group configuration in llm-foundry by @vchiley in #760
Add script for doing bulk generation against an endpoint by @aspfohl in #765
Only strip object names when creating new output path by @irenedea in #766
Add eval loader to eval script by @aspfohl in #742
Support inputs_embeds by @samhavens in #687
Better error message when test does not complete by @aspfohl in #769
Add codeowners by @dakinggg in #770
add single value support to activation_checkpointing_target by @cli99 in #772
Reorganize tests to make them easier to find by @aspfohl in #768
Add "completion" alias for response key by @dakinggg in #771
Shashank/seq id flash attn by @ShashankMosaicML in #738
Fix SIQA gold indices by @bmosaicml in #774
Add missing load_weights_only to example yamls by @dakinggg in #776
Patch flash attn in test to simulate environment without it installed by @dakinggg in #778
Update .gitignore by @aspfohl in #781
Disable mosaicml logger in foundry CI/CD by @mvpatel2000 in #788
Chat fomating template changes by @rajammanabrolu in #784
Remove tests and support for torch <2.1 by @dakinggg in #787
Fix utf-8 decode errors in tiktoken wrapper by @dakinggg in #792
Update gauntlet v0.2 to reflect results of calibration by @bmosaicml in #791
Remove from mcli.sdk imports by @aspfohl in #793
Auto packing fixes by @irenedea in #783
Enable flag to not pass PAD tokens in ffwd by @bcui19 in #775
Adding a fix for Cross Entropy Loss for long sequence lengths. by @ShashankMosaicML in #795
Minor readme updates and bump min python version by @dakinggg in #799
Enable GLU FFN type by @vchiley in #796
clean up resolve_ffn_hidden_and_exp_ratio by @vchiley in #801
Fix token counting to use attention mask instead of ids by @dakinggg in #802
update openai wrapper to work with tiktoken interface and newest openai version by @bmosaicml in #794
Fix openai not conditioned imports by @dakinggg in #806
Make the ffn activation func configurable by @vchiley in #805
Clean up the logs, bump datasets and transformers by @dakinggg in #804
Fix remote path check for UC volumes by @irenedea in #809
Expand options for MMLU. by @mansheej in #811
Async eval callback by @aspfohl in #702
Updating the Flash Attention version to fix cross entropy loss by @ShashankMosaicML in #812
Remove redundant transposes for rope rotation by @ShashankMosaicML in #807
Add generic flatten imports to HF checkpointer by @b-chu in #814
Fix token counting to allow there to be no attention mask by @dakinggg in #818
Default to using tokenizer eos and bos in convert_text_to_mds.py by @irenedea in #823
Revert "Default to using tokenizer eos and bos in convert_text_to_mds.py" by @irenedea in #825
Bump turbo version to 0.0.7 by @mvpatel2000 in #827
Align GLU implementation with LLaMa by @vchiley in #829
Use sync_module_states: True when using HSDP by @abhi-mosaic in #830
Update composer to 0.17.2 and streaming to 0.7.2 by @irenedea in #822
zero bias conversion corrected by @megha95 in #624
Bump einops version, which has improved support for torch compile by @sashaDoubov in #832
Update README with links to ML HW resources by @abhi-mosaic in #833
Add safe_load option to restrict HF dataset downloads to allowed file types by @irenedea in #798
Adding support for alibi when using flash attention by @ShashankMosaicML in #820
Shashank/new benchmarks by @ShashankMosaicML in #838
Fix error when decoding a token in the id gap (or out of range) in a tiktoken tokenizer by @dakinggg in #841
Add use_tokenizer_eos option to convert text to mds script by @irenedea in #843
Disable Environment Variable Resolution by @irenedea in #845
Bump pre-commit version by @b-chu in #847
Fix typo kwargs=>hf_kwargs by @irenedea in #853
Remove foundry time wrangling by @aspfohl in #855
Minor cleanups by @mvpatel2000 in #858
Read UC delta table by @XiaohanZhangCMU in #773
Remove fused layernorm (deprecated in composer) by @mvpatel2000 in #859
Remove hardcoded combined.jsonl with a flag by @XiaohanZhangCMU in #861
Bump to turbo v8 by @mvpatel2000 in #828
Always initialize dist by @mvpatel2000 in #864
Logs upload URI by @milocress in #850
Delta to JSONL conversion script cleanup and bug fix by @nancyhung in #868
Fix MLFlowLogger mock in tests by @jerrychen109 in #872
[XS] Fix delta conversion script regex bug by @nancyhung in #877
Precompute flash attention padding info by @ShashankMosaicML in #880
Add GQA to init.py by @mvpatel2000 in #882
fsdp wrap refac by @vchiley in #883
Update model download utils to support ORAS by @jerrychen109 in #881
Update license by @b-chu in #887
Fix tiktoken tokenizer add_generation_prompt by @irenedea in #890
Upgrade datasets version by @dakinggg in #892
Bump transformers version to support Mixtral by @dakinggg in #894
Add tokenizer-only flag to only download tokenizers from HF or oras by @irenedea in #895
Foundational Model API eval wrapper by @aspfohl in #849
Add better error for non-empty local output folder in convert_text_to_mds.py by @irenedea in #891
Allow bool input for loggers by @ngcgarcia in #897
Enable QK Group Norm by @vchiley in #869
Workflow should not have leading ./ by @mvpatel2000 in #905
Add new GC option by @dakinggg in #907
No symlinks at all for HF download by @jerrychen109 in #908
Adds support for chat formatted finetuning input data. by @milocress in #884
Add flag to enable/disable param upload by @ngcgarcia in #912
Add support for eval_loader & eval_subset_num_batches in async callback by @aspfohl in #834
Add the model license file for mlflow by @dakinggg in #915
Warn instead of error on tokenizer-only download with http by @jerrychen109 in #904
Fix fmapi_chat for instruct models and custom tokenizers by @aspfohl in #914
Make yamllint consistent with Composer by @b-chu in #918
Create HF checkpointer model on meta device by @dakinggg in #916
Tiktoken chat format fix by @rajammanabrolu in #893
fix dash issue by @milocress in #919
Fix yaml linting by @b-chu in #920
Adding deprecation warning for Flash Attention 1 and user warning against using Triton attention. by @ShashankMosaicML in #921
Add rich formatting to tracebacks by @jjanezhang in #927
Fix docker workflow caching by @irenedea in #930
Remove .ci folder and move FILE_HEADER by @irenedea in #931
Throw error when no EOS by @KuuCi in #922
Bump composer to 0.19 by @dakinggg in #934
Update eval_gauntlet_callback.py with math.log2 by @Skylion007 in #821
Switch to the Composer integration of LoRA (works with FSDP) by @dakinggg in #886
Refactoring the add_metrics_to_eval_loaders function to accept list of metric names instead of a dictionary of metrics. by @ShashankMosaicML in #938
Fix an extra call to load state dict and type cast in hf checkpointer by @dakinggg in #939
Fixing the gen_attention_mask_in_length function to handle the case when sequence id is -1 due to attention masking by @ShashankMosaicML in #940
Update lora docs by @dakinggg in #941
Bump FAv2 setup.py by @mvpatel2000 in #942
Retrieve license information when local files are provided for a pretrained model by @jerrychen109 in #943
Add and use VersionedDeprecationWarning by @irenedea in #944
Bump llm-foundry version to 0.5.0 by @irenedea in #948

New Contributors

@megha95 made their first contribution in #624
@milocress made their first contribution in #850
@nancyhung made their first contribution in #868
@ngcgarcia made their first contribution in #897
@KuuCi made their first contribution in #922
@Skylion007 made their first contribution in #821

Full Changelog: v0.4.0...v0.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.0