v0.5.0
🚀 LLM Foundry v0.5.0
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.
In addition to the usual bug fixes and performance improvements, we've added lots of new features!
New Features
LoRA Support (with FSDP!) (#886)
LLM Foundry now supports LoRA via an integration with the PEFT library. Within LLM Foundry, run train.py
, adding peft_config arguments to the model section of the config .yaml
, like so:
model:
...
peft_config:
r: 16
peft_type: LORA
task_type: CAUSAL_LM
lora_alpha: 32
lora_dropout: 0.05
target_modules:
- q_proj
- k_proj
Read more about it in the tutorial.
ALiBi for Flash Attention (#820)
We've added support for using ALiBi with Flash Attention (v2.4.2 or higher).
model:
...
attn_config:
attn_impl: flash
alibi: True
Chat Data for Finetuning (#884)
We now support finetuning on chat data, with automatic formatting applied using Hugging Face tokenizer chat templates.
Each sample requires a single key "messages"
that maps to an array of message objects. Each message object in the array represents a single message in the conversation and must contain the following keys:
role
: A string indicating the author of the message. Possible values are"system"
,"user"
, and"assistant"
.content
: A string containing the text of the message.
We require that there must be at least one message with the role "assistant", and the last message in the "messages" array must have the role "assistant" .
Here's an example .jsonl
with chat data:
{ "messages": [ { "role": "user", "content": "Hi, MPT!" }, { "role": "assistant", "content": "Hi, user!" } ]}
{ "messages": [
{ "role": "system": "A conversation between a user and a helpful and honest assistant"}
{ "role": "user", "content": "Hi, MPT!" },
{ "role": "assistant", "content": "Hi, user!" },
{ "role": "user", "content": "Is multi-turn chat supported?"},
{ "role": "assistant", "content": "Yes, we can chat for as long as my context length allows." }
]}
...
Safe Load for HuggingFace Datasets (#798)
We now provide a safe_load
option when loading HuggingFace datasets for finetuning.
This restricts loaded files to .jsonl
, .csv
, or .parquet
extensions to prevent arbitrary code execution.
To use, set safe_load
to true
in your dataset configuration:
train_loader:
name: finetuning
dataset:
safe_load: true
...
New PyTorch, Composer, Streaming, and Transformers versions
As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (mixtral in particular).
Deprecations
Support for Flash Attention v1 (#921)
Will be removed in v0.6.0.
Breaking Changes
Removed support for PyTorch versions before 2.1 (#787)
We no longer support PyTorch versions before 2.1.
Removed Deprecated Features (#948)
We've removed features that have been deprecated for at least one release.
What's Changed
- Small test fix to have right padding by @sashaDoubov in #757
- Release 040 back to main by @dakinggg in #758
- Bump composer version to 0.17.1 by @irenedea in #762
- Docker release on workflow_dispatch by @bandish-shah in #763
- Fix tiktoken wrapper by @dakinggg in #761
- enable param group configuration in llm-foundry by @vchiley in #760
- Add script for doing bulk generation against an endpoint by @aspfohl in #765
- Only strip object names when creating new output path by @irenedea in #766
- Add eval loader to eval script by @aspfohl in #742
- Support inputs_embeds by @samhavens in #687
- Better error message when test does not complete by @aspfohl in #769
- Add codeowners by @dakinggg in #770
- add single value support to activation_checkpointing_target by @cli99 in #772
- Reorganize tests to make them easier to find by @aspfohl in #768
- Add "completion" alias for response key by @dakinggg in #771
- Shashank/seq id flash attn by @ShashankMosaicML in #738
- Fix SIQA gold indices by @bmosaicml in #774
- Add missing load_weights_only to example yamls by @dakinggg in #776
- Patch flash attn in test to simulate environment without it installed by @dakinggg in #778
- Update .gitignore by @aspfohl in #781
- Disable mosaicml logger in foundry CI/CD by @mvpatel2000 in #788
- Chat fomating template changes by @rajammanabrolu in #784
- Remove tests and support for torch <2.1 by @dakinggg in #787
- Fix utf-8 decode errors in tiktoken wrapper by @dakinggg in #792
- Update gauntlet v0.2 to reflect results of calibration by @bmosaicml in #791
- Remove from mcli.sdk imports by @aspfohl in #793
- Auto packing fixes by @irenedea in #783
- Enable flag to not pass PAD tokens in ffwd by @bcui19 in #775
- Adding a fix for Cross Entropy Loss for long sequence lengths. by @ShashankMosaicML in #795
- Minor readme updates and bump min python version by @dakinggg in #799
- Enable GLU FFN type by @vchiley in #796
- clean up resolve_ffn_hidden_and_exp_ratio by @vchiley in #801
- Fix token counting to use attention mask instead of ids by @dakinggg in #802
- update openai wrapper to work with tiktoken interface and newest openai version by @bmosaicml in #794
- Fix openai not conditioned imports by @dakinggg in #806
- Make the ffn activation func configurable by @vchiley in #805
- Clean up the logs, bump datasets and transformers by @dakinggg in #804
- Fix remote path check for UC volumes by @irenedea in #809
- Expand options for MMLU. by @mansheej in #811
- Async eval callback by @aspfohl in #702
- Updating the Flash Attention version to fix cross entropy loss by @ShashankMosaicML in #812
- Remove redundant transposes for rope rotation by @ShashankMosaicML in #807
- Add generic flatten imports to HF checkpointer by @b-chu in #814
- Fix token counting to allow there to be no attention mask by @dakinggg in #818
- Default to using tokenizer eos and bos in convert_text_to_mds.py by @irenedea in #823
- Revert "Default to using tokenizer eos and bos in convert_text_to_mds.py" by @irenedea in #825
- Bump turbo version to 0.0.7 by @mvpatel2000 in #827
- Align GLU implementation with LLaMa by @vchiley in #829
- Use
sync_module_states: True
when using HSDP by @abhi-mosaic in #830 - Update composer to 0.17.2 and streaming to 0.7.2 by @irenedea in #822
- zero bias conversion corrected by @megha95 in #624
- Bump einops version, which has improved support for torch compile by @sashaDoubov in #832
- Update README with links to ML HW resources by @abhi-mosaic in #833
- Add safe_load option to restrict HF dataset downloads to allowed file types by @irenedea in #798
- Adding support for alibi when using flash attention by @ShashankMosaicML in #820
- Shashank/new benchmarks by @ShashankMosaicML in #838
- Fix error when decoding a token in the id gap (or out of range) in a tiktoken tokenizer by @dakinggg in #841
- Add use_tokenizer_eos option to convert text to mds script by @irenedea in #843
- Disable Environment Variable Resolution by @irenedea in #845
- Bump pre-commit version by @b-chu in #847
- Fix typo kwargs=>hf_kwargs by @irenedea in #853
- Remove foundry time wrangling by @aspfohl in #855
- Minor cleanups by @mvpatel2000 in #858
- Read UC delta table by @XiaohanZhangCMU in #773
- Remove fused layernorm (deprecated in composer) by @mvpatel2000 in #859
- Remove hardcoded combined.jsonl with a flag by @XiaohanZhangCMU in #861
- Bump to turbo v8 by @mvpatel2000 in #828
- Always initialize dist by @mvpatel2000 in #864
- Logs upload URI by @milocress in #850
- Delta to JSONL conversion script cleanup and bug fix by @nancyhung in #868
- Fix MLFlowLogger mock in tests by @jerrychen109 in #872
- [XS] Fix delta conversion script regex bug by @nancyhung in #877
- Precompute flash attention padding info by @ShashankMosaicML in #880
- Add GQA to init.py by @mvpatel2000 in #882
- fsdp wrap refac by @vchiley in #883
- Update model download utils to support ORAS by @jerrychen109 in #881
- Update license by @b-chu in #887
- Fix tiktoken tokenizer add_generation_prompt by @irenedea in #890
- Upgrade
datasets
version by @dakinggg in #892 - Bump transformers version to support Mixtral by @dakinggg in #894
- Add
tokenizer-only
flag to only download tokenizers from HF or oras by @irenedea in #895 - Foundational Model API eval wrapper by @aspfohl in #849
- Add better error for non-empty local output folder in convert_text_to_mds.py by @irenedea in #891
- Allow bool input for loggers by @ngcgarcia in #897
- Enable QK Group Norm by @vchiley in #869
- Workflow should not have leading ./ by @mvpatel2000 in #905
- Add new GC option by @dakinggg in #907
- No symlinks at all for HF download by @jerrychen109 in #908
- Adds support for chat formatted finetuning input data. by @milocress in #884
- Add flag to enable/disable param upload by @ngcgarcia in #912
- Add support for eval_loader & eval_subset_num_batches in async callback by @aspfohl in #834
- Add the model license file for mlflow by @dakinggg in #915
- Warn instead of error on tokenizer-only download with http by @jerrychen109 in #904
- Fix fmapi_chat for instruct models and custom tokenizers by @aspfohl in #914
- Make yamllint consistent with Composer by @b-chu in #918
- Create HF checkpointer model on meta device by @dakinggg in #916
- Tiktoken chat format fix by @rajammanabrolu in #893
- fix dash issue by @milocress in #919
- Fix yaml linting by @b-chu in #920
- Adding deprecation warning for Flash Attention 1 and user warning against using Triton attention. by @ShashankMosaicML in #921
- Add rich formatting to tracebacks by @jjanezhang in #927
- Fix docker workflow caching by @irenedea in #930
- Remove .ci folder and move FILE_HEADER by @irenedea in #931
- Throw error when no EOS by @KuuCi in #922
- Bump composer to 0.19 by @dakinggg in #934
- Update eval_gauntlet_callback.py with math.log2 by @Skylion007 in #821
- Switch to the Composer integration of LoRA (works with FSDP) by @dakinggg in #886
- Refactoring the
add_metrics_to_eval_loaders
function to accept list of metric names instead of a dictionary of metrics. by @ShashankMosaicML in #938 - Fix an extra call to load state dict and type cast in hf checkpointer by @dakinggg in #939
- Fixing the gen_attention_mask_in_length function to handle the case when sequence id is -1 due to attention masking by @ShashankMosaicML in #940
- Update lora docs by @dakinggg in #941
- Bump FAv2 setup.py by @mvpatel2000 in #942
- Retrieve license information when local files are provided for a pretrained model by @jerrychen109 in #943
- Add and use VersionedDeprecationWarning by @irenedea in #944
- Bump llm-foundry version to 0.5.0 by @irenedea in #948
New Contributors
- @megha95 made their first contribution in #624
- @milocress made their first contribution in #850
- @nancyhung made their first contribution in #868
- @ngcgarcia made their first contribution in #897
- @KuuCi made their first contribution in #922
- @Skylion007 made their first contribution in #821
Full Changelog: v0.4.0...v0.5.0