🚀 LLM Foundry v0.4.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT-7B and MPT-30B models.

In addition to the usual bug fixes and performance improvements, we've added lots of new features!

New Features

Automatic sequence packing (#683)

You can now specify packing_ratio: auto under your finetuning dataset, to automatically profile and select a good packing ratio to efficiently pack your sequences together on the fly during finetuning. This can dramatically reduce the amount of compute wasted on padding tokens.

Flash Attention 2 (#651, #666, #672)

We now support using Flash Attention 2 both in MPT and in any model that supports Flash Attention 2 via the Transformers library. See the training instructions to learn how to use the different versions of Flash Attention.

New PyTorch, Composer, Streaming, and Transformers versions (#648, #672, #736)

As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (codellama and mistral in particular).

Easy Databricks model deployment (#618)

We've made it much easier to go from a training run to a served model using Databricks model serving. To make use of this feature, you need to specify both an MLFlowLogger and a HuggingFaceCheckpointer for your run.

The MLFlowLogger should have a Unity Catalog model registry prefix in the form of catalog.schema. This specifies where to register your models to. For example,

loggers:
    mlflow:
        experiment_name: /Users/first.last@email.com/my_experiment_name
        tracking_uri: databricks
        model_registry_prefix: catalog.schema
        model_registry_uri: databricks-uc

The HuggingFaceCheckpointer should specify the name you want to register the model under. For example,

callbacks:
    hf_checkpointer:
        save_interval: 1ep # Save Hugging Face formatted checkpoints each epoch
        save_folder: s3://bucket/path/to/my/checkpoints
        mlflow_registered_model_name: my_model_name # Final model will be registered to catalog.schema.my_model_name

MPT model configurations

We've added a few new options when training with the MPT architecture in LLM Foundry.

Rotary embeddings (#675)
(Un)Tied word embeddings (#728)
Fine grained activation checkpointing (#720)

Evaluation Improvements

We've released v0.1 of our Eval Gauntlet (#674, #748)! This adds many new benchmarks, chain-of-thought, and a new safety category. Check out the README for full details!

In addition, we've made a few improvements to our evaluation options, with more to come!

Allow specifying multiple evaluation datasets to compute cross entropy and perplexity on during training (#603)
Easier versions of the HumanEval dataset, which can be useful for comparing smaller models (#645)
More options for averaging the results of the Eval Gauntlet (#640)

New pretraining benchmarks (#543)

Added H100 profiling results to our benchmarking table.

Quality of life improvements

Improved Generate callback with more logging options. Use the Generate callback to log generations from your model over the course of training. (#631)
Count number of tokens during training excluding padding tokens. Previously this count included padding tokens. (#676)
Use the PyTorch profiler to profile your training runs. (#678)
A convenience script for using the much faster Hugging Face snapshot_download to download models from the Hugging Face Hub. (#708)
New AWS specific Docker images with LLM Foundry dependencies pre-installed. (#731)

Experimental features

Inverse square root learning rate scheduler (#657)

We've added experimental support for the inverse square root learning rate scheduler.

Breaking changes

Updated Streaming defaults (#723)

We've upgraded to the latest Streaming version, including vastly improved default settings for partitioning and shuffling. This means that if you were using the defaults, you will get different results after upgrading. The new defaults should be more performant for the large majority of use cases. See the Streaming release notes for more details.

Removed support for PrefixLM for Bloom and OPT models (#704)

We occasionally remove unused experimental parts of the code base to focus on new features and better support for existing features, and we've removed support for PrefixLM applied to Bloom and OPT models in this release.

What's Changed

Multi eval dataset logging by @snarayan21 in #603
Merge release 0.3.0 back to main by @dakinggg in #635
Add tmp path retention policy by @j316chuck in #641
Add flag to disable train metrics by @mvpatel2000 in #642
Update pins to latest version that were missed by @dakinggg in #646
Fix overriding of rope_scaling config by @dakinggg in #644
Add 2.1 images to docker workflow and tests by @dakinggg in #648
Fixes to lion8b test for torch 2.1 by @dakinggg in #649
Only log "changing autoresume" when actually changing by @aspfohl in #653
Fix lion8b error correction with torch 2.1 by @dblalock in #656
Clean up processes between distributed gpu tests by @j316chuck in #660
Revert "Clean up processes between distributed gpu tests (#660)" by @j316chuck in #662
Switch ordering of foundry gpu tests by @j316chuck in #665
Change batch size on coding tasks to 1 to avoid OOM by @bmosaicml in #654
Add images with flash attention 2 by @dakinggg in #651
Fix yaml change by @dakinggg in #667
Revert actions change by @dakinggg in #668
Inverse Square Root LR Schedule by @mansheej in #657
Add test suite for flash attention 2 by @dakinggg in #666
Adding Simplified Coding Tasks by @mcarbin in #645
Fix typo in image name by @dakinggg in #669
Point to composer.callback.Generate by @aspfohl in #631
Do not update past_key_values in place by @irenedea in #652
Fix small typos in the eval readme by @maxisawesome in #671
Convert to DataSpec and add token counts that include padding by @dakinggg in #676
Add support for automatically registering models to UC at the end of training by @dakinggg in #618
add load_strict_model_weights as an optional config parameter by @AllenHW in #655
Small changes to HF repo update script by @dakinggg in #680
Add profiler support in llm foundry by @j316chuck in #678
Update_pretrain_benchmarks by @crinard in #543
add |---| to render tables correctly by @crinard in #686
Adding Mosaic logger + logging data validated event by @jjanezhang in #670
Tiktoken wrapper add_eos_token option by @rajammanabrolu in #681
Attempt to fix flaky test by @dakinggg in #688
Allow flash attention 2 and upgrade to transformers 4.34.1 by @dakinggg in #672
Fix mlflow model logging bug by @dakinggg in #692
Add fixtures by @irenedea in #673
Make default for cuda_load_lazy false by @irenedea in #694
Update README.md by @j316chuck in #693
Pad tiktoken vocab so that additional_special_tokens works by @dakinggg in #695
Remove live logs to be consistent with Composer by @mvpatel2000 in #698
Change gauntlet avging by @bmosaicml in #640
Remove prefixlm support for OPT and Bloom by @dakinggg in #704
Fix attention patch compatibility for llama2 by @irenedea in #705
Add test coverage for lion and lion8b checkpoint interop by @dblalock in #679
Improvement in README.md and TUTORIAL.md by @tmsagarofficial in #699
Make TiktokenTokenizerWrapper picklable by @irenedea in #700
Add num_proc to map and filter calls by @dakinggg in #706
Fix HF local module copy contention with a meta init on local rank 0 by @dakinggg in #710
Add support for auto packing ratio by @irenedea in #683
Remove HumanEval tasks from ICL eval by @tbarton16 in #715
Allow logging metadata by @dakinggg in #714
Run HF dataset processing on local rank 0 first by @dakinggg in #716
Add Hugging Face model download script by @jerrychen109 in #708
Adding support for Rotary Position Embeddings by @ShashankMosaicML in #675
Add databricks dependency by @irenedea in #717
Set persistent_workers = False for packing profiling by @dakinggg in #718
raise timeout for GPU tests by @mvpatel2000 in #719
change default overwrite to True by @dakinggg in #724
Attempt to fix a very occasional hang in datasets map/filter by @dakinggg in #725
Add Unity Catalog support to HF checkpointer by @dakinggg in #721
Combine filters into one, to avoid datasets error by @dakinggg in #729
Fix logging verbosity in HF model download script and repair symlinks by @jerrychen109 in #727
Gate the dist calls in build_tokenizer by @dakinggg in #732
Create AWS docker image for fine tuning by @j316chuck in #731
Make TiktokenTokenizerWrapper compatible with convert_composer_to_hf.py by @irenedea in #730
Enable tie_word_embeddings config setting to enable / disable weight tied embeddings by @vchiley in #728
add act checkpoint at sub layer level by @cli99 in #720
Better defaults for StreamingDataset subclasses by @snarayan21 in #723
Rename log message by @b-chu in #734
Remove tokenizer_name field by @dakinggg in #735
Fix pairwise attention comparison in test by @sashaDoubov in #737
Fix passed metadata to mlflow logging by @wenfeiy-db in #713
HF script explicitly casts precision by @mvpatel2000 in #741
Bump to composer 0.17 by @dakinggg in #736
Patch os cpu count to avoid extra multiprocessing inside pytest which sometimes hangs by @dakinggg in #745
Reenable tests that were accidentally disabled by @dakinggg in #746
Gauntlet v0.1 by @bmosaicml in #674
Remove extra test suite by @dakinggg in #743
Fix typo in workflow file by @dakinggg in #750
Fix 1.13 tests by @dakinggg in #751
Pin Chat format to TiktokenTokenizerWrapper by @rajammanabrolu in #752
Catch exception raised in hf prep properly by @j316chuck in #749
Gauntlet v0.1.0 yaml fixes by @bmosaicml in #748
Fix flash attention GQA bug to use the dynamic size of the key/value tensors - used for eval/inference by @sashaDoubov in #756

New Contributors

@mansheej made their first contribution in #657
@mcarbin made their first contribution in #645
@maxisawesome made their first contribution in #671
@AllenHW made their first contribution in #655
@crinard made their first contribution in #543
@jjanezhang made their first contribution in #670
@rajammanabrolu made their first contribution in #681
@tmsagarofficial made their first contribution in #699
@tbarton16 made their first contribution in #715
@ShashankMosaicML made their first contribution in #675
@cli99 made their first contribution in #720
@b-chu made their first contribution in #734
@wenfeiy-db made their first contribution in #713

Full Changelog: v0.3.0...v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0