Releases: mosaicml/llm-foundry
v0.14.5
v0.14.4
v0.14.3
v0.14.2
v0.14.1
New Features
Use log_model for registering models (#1544 )
Instead of calling the mlflow register API directly, we use the intended log_model
API, which will both log the model to mlflow run artifacts, and register it to Unity Catalog.
What's Changed
- Catch delta table not found error by @milocress in #1625
- Add Mlflow 403 PL UserError @dakinggg in #1623
- Catches when data prep cluster fails to start by @milocress in #1628
- add another cluster connection failure wrapper by @milocress in #1630
- Use log_model API to register the model by @nancyhung @dakinggg in #1544
Full Changelog: v0.14.0...v0.14.1
v0.14.0
New Features
Load Checkpoint Callback (#1570)
We added support for Composer's LoadCheckpoint callback, which loads a checkpoint at a specified event. This enables use cases like loading model base weights with peft.
callbacks:
load_checkpoint:
load_path: /path/to/your/weights
Breaking Changes
Accumulate over tokens in a Batch for Training Loss (#1618,#1610,#1595)
We added a new flag accumulate_train_batch_on_tokens
which specifies whether training loss is accumulated over the number of tokens in a batch, rather than the number of samples. It is true by default. This will slightly change loss curves for models trained with padding. The old behavior can be recovered by simply setting this to False explicitly.
Default Run Name (#1611)
If no run name is provided, we now will default to using composer's randomly generated run names. (Previously, we defaulted to using "llm" for the run name.)
What's Changed
- Update mcli examples to use 0.13.0 by @irenedea in #1594
- Pass accumulate_train_batch_on_tokens through to composer by @dakinggg in #1595
- Loosen MegaBlocks version pin by @mvpatel2000 in #1597
- Add configurability for hf checkpointer register timeout by @dakinggg in #1599
- Loosen MegaBlocks to <1.0 by @mvpatel2000 in #1598
- Finetuning dataloader validation tweaks by @mvpatel2000 in #1600
- Bump onnx from 1.16.2 to 1.17.0 by @dependabot in #1604
- Remove TE from dockerfile and instead add as optional dependency by @snarayan21 in #1605
- Data prep on multiple GPUs by @eitanturok in #1576
- Add env var for configuring the maximum number of processes to use for dataset processing by @irenedea in #1606
- Updated error message for cluster check by @nancyhung in #1602
- Use fun default composer run names by @irenedea in #1611
- Ensure log messages are properly formatted again by @snarayan21 in #1614
- Add UC not enabled error for delta to json conversion by @irenedea in #1613
- Use a temporary directory for downloading finetuning dataset files by @irenedea in #1608
- Bump composer version to 0.26.0 by @irenedea in #1616
- Add loss generating token counts by @dakinggg in #1610
- Change accumulate_train_batch_on_tokens default to True by @dakinggg in #1618
- Bump version to 0.15.0.dev0 by @irenedea in #1621
- Add load checkpoint callback by @irenedea in #1570
Full Changelog: v0.13.0...v0.14.0
v0.13.1
v0.13.0
🚀 LLM Foundry v0.13.0
🛠️ Bug Fixes & Cleanup
Pytorch 2.4 Checkpointing (#1569, #1581, #1583)
Resolved issues related to checkpointing for Curriculum Learning (CL) callbacks.
🔧 Dependency Updates
Bumped tiktoken from 0.4.0 to 0.8.0 (#1572)
Updated onnxruntime from 1.19.0 to 1.19.2 (#1590)
What's Changed
- Update mcli yamls by @dakinggg in #1552
- Use
allenai/c4
instead ofc4
dataset by @eitanturok in #1554 - Tensor Parallelism by @eitanturok in #1521
- Insufficient Permissions Error when trying to access table by @KuuCi in #1555
- Add NoOp optimizer by @snarayan21 in #1560
- Deterministic GCRP Errors by @KuuCi in #1559
- Simplify CL API by @b-chu in #1510
- Reapply #1389 by @dakinggg in #1561
- Add dataset swap callback by @b-chu in #1536
- Add error to catch more unknown example types by @milocress in #1562
- Add FileExtensionNotFoundError by @b-chu in #1564
- Add InvalidConversationError by @b-chu in #1565
- Release docker img by @KuuCi in #1547
- Revert FT dataloader changes from #1561, keep #1564 by @snarayan21 in #1566
- Cleanup TP by @eitanturok in #1556
- Changes for dataset swap callback by @gupta-abhay in #1569
- Do not consider run_name when auto-detecting autoresume by @irenedea in #1571
- Allow parameters with requires_grad=False in meta init by @sashaDoubov in #1567
- Bump tiktoken from 0.4.0 to 0.8.0 by @dependabot in #1572
- Add extensions to FinetuningFileNotFoundError by @b-chu in #1578
- Handle long file names in convert text to mds by @irenedea in #1579
- Set streaming log level by @mvpatel2000 in #1582
- Fix pytorch checkpointing for CL callback by @b-chu in #1581
- Fix pytorch checkpointing for CL callback by @b-chu in #1583
- Error if filtered dataset contains 0 examples by @irenedea in #1585
- Change cluster errors from NetworkError to UserError by @irenedea in #1586
- Do not autoresume if a default name is set, only on user defined ones by @irenedea in #1588
- Bump onnxruntime from 1.19.0 to 1.19.2 by @dependabot in #1590
- Make FinetuningStreamingDataset parameters more flexible by @XiaohanZhangCMU in #1580
- Add build callback tests by @irenedea in #1577
- Bump version to 0.14.0.dev0 by @irenedea in #1587
- Fix typo in eval code by using 'fsdp' instead of 'fsdp_config' by @irenedea in #1593
Full Changelog: v0.12.0...v0.13.0
v0.12.0
🚀 LLM Foundry v0.12.0
New Features
PyTorch 2.4 (#1505)
This release updates LLM Foundry to the PyTorch 2.4 release, bringing with it support for the new features and optimizations in PyTorch 2.4
Extensibility improvements (#1450, #1449, #1468, #1467, #1478, #1493, #1495, #1511, #1512, #1527)
Numerous improvements to the extensibility of the modeling and data loading code, enabling easier reuse for subclassing and extending. Please see the linked PRs for more details on each change.
Improved error messages (#1457, #1459, #1519, #1518, #1522, #1534, #1548, #1551)
Various improved error messages, making debugging user errors more clear.
Sliding window in torch attention (#1455)
We've added support for sliding window attention to the reference attention implementation, allowing easier testing and comparison against more optimized attention variants.
Bug fixes
Extra BOS token for llama 3.1 with completion data (#1476)
A bug resulted in an extra BOS token being added between prompt and response during finetuning. This is fixed so that the prompt and response supplied by the user are concatenated without any extra tokens put between them.
What's Changed
- Add test for logged_config transforms by @b-chu in #1441
- Bump version to 0.12.0.dev0. by @irenedea in #1447
- Update pytest-codeblocks requirement from <0.17,>=0.16.1 to >=0.16.1,<0.18 by @dependabot in #1445
- Bump coverage[toml] from 7.4.4 to 7.6.1 by @dependabot in #1442
- Enabled generalizing build_inner_model in ComposerHFCausalLM by @gupta-abhay in #1450
- Update llm foundry version in mcli yamls by @irenedea in #1451
- merge to main by @XiaohanZhangCMU in #865
- allow embedding resizing passed through by @jdchang1 in #1449
- Update packaging requirement from <23,>=21 to >=21,<25 by @dependabot in #1444
- Update pytest requirement from <8,>=7.2.1 to >=7.2.1,<9 by @dependabot in #1443
- Implement ruff rules enforcing PEP 585 by @snarayan21 in #1453
- Adding sliding window attn to scaled_multihead_dot_product_attention by @ShashankMosaicML in #1455
- Add user error for UnicodeDeocdeError in convert text to mds by @irenedea in #1457
- Fix log_config by @josejg in #1432
- Add EnvironmentLogger Callback by @josejg in #1350
- Update mosaicml/ci-testing to 0.1.2 by @irenedea in #1458
- Correct error message for inference wrapper by @josejg in #1459
- Update CI tests to v0.1.2 by @KuuCi in #1466
- Bump onnxruntime from 1.18.1 to 1.19.0 by @dependabot in #1461
- Update tenacity requirement from <9,>=8.2.3 to >=8.2.3,<10 by @dependabot in #1460
- Simple change to enable mapping functions for ft constructor by @gupta-abhay in #1468
- use default eval interval from composer by @milocress in #1369
- Consistent Naming EnviromentLoggingCallback by @josejg in #1470
- Register NaN Monitor Callback by @josejg in #1471
- Add train subset num batches by @mvpatel2000 in #1472
- Parent class hf models by @jdchang1 in #1467
- Remove extra bos for prompt/response data with llama3.1 by @dakinggg in #1476
- Add prepare fsdp back by @dakinggg in #1477
- Add date_string when applying tokenizer chat template by @snarayan21 in #1474
- Make sample tokenization extensible by @gupta-abhay in #1478
- Use Streaming version 0.8.1 by @snarayan21 in #1479
- Bump hf-transfer from 0.1.3 to 0.1.8 by @dependabot in #1480
- fix hf checkpointer by @milocress in #1489
- Fix device mismatch when running hf.generate by @ShashankMosaicML in #1486
- Bump composer to 0.24.1 + FSDP config device_mesh deprecation by @snarayan21 in #1487
- master_weights_dtype not supported by ComposerHFCausalLM.init() by @eldarkurtic in #1485
- Detect loss spikes and high losses during training by @joyce-chen-uni in #1473
- Enable passing in external position ids by @gupta-abhay in #1493
- Align logged attributes for errors and run metadata in kill_loss_spike_callback.py by @joyce-chen-uni in #1494
- tokenizer is never built when converting finetuning dataset by @eldarkurtic in #1496
- Removing error message for reusing kv cache with torch attn by @ShashankMosaicML in #1497
- Fix formatting of loss spike & high loss error messages by @joyce-chen-uni in #1498
- Enable cross attention layers by @gupta-abhay in #1495
- Update to ci-testing 0.2.0 by @dakinggg in #1500
- [WIP] Torch 2.4 in docker images by @snarayan21 in #1491
- [WIP] Only torch 2.4.0 compatible by @snarayan21 in #1505
- Update mlflow requirement from <2.16,>=2.14.1 to >=2.14.1,<2.17 by @dependabot in #1506
- Update ci-testing to 0.2.2 by @dakinggg in #1503
- Allow passing key_value_statest for x-attn through MPT Block by @gupta-abhay in #1511
- Fix cross attention for blocks by @gupta-abhay in #1512
- Put 2.3 image back in release examples by @dakinggg in #1513
- Sort callbacks so that CheckpointSaver goes before HuggingFaceCheckpointer by @irenedea in #1515
- Raise MisconfiguredDatasetError from original error by @irenedea in #1519
- Peft fsdp by @dakinggg in #1520
- Raise DatasetTooSmall exception if canonical nodes is less than num samples by @irenedea in #1518
- Add permissions check for delta table reading by @irenedea in #1522
- Add HuggingFaceCheckpointer option for only registering final checkpoint by @irenedea in #1516
- Replace FSDP args by @KuuCi in #1517
- enable correct padding_idx for embedding layers by @gupta-abhay in #1527
- Revert "Replace FSDP args" by @KuuCi in #1533
- Delete unneeded inner base model in PEFT HF Checkpointer by @snarayan21 in #1532
- Add deprecation warning to fsdp_config by @KuuCi in #1530
- Fix reuse kv cache for torch attention by @ShashankMosaicML in #1539
- Error on text dataset file not found by @milocress in #1534
- Make ICL tasks not required for eval by @snarayan21 in #1540
- Bumping flash attention version to 2.6.3 and adding option for softcap in attention and lm_head logits. by @ShashankMosaicML in #1374
- Register mosaic logger by @dakinggg in #1542
- Hfcheckpointer optional generation config by @KuuCi in #1543
- Bump composer version to 0.25.0 by @dakinggg in #1546
- Bump streaming version to 0.9.0 by @dakinggg in #1550
- Bump version to 0.13.0.dev0 by @dakinggg in #1549
- Add proper user error for accessing schema by @KuuCi in #1548
- Validate Cluster Access Mode by @KuuCi in #1551
New Contributors
- @jdchang1 made their first contribution in #1449
- @joyce-chen-uni made their first contribution in #1473
Full Changelog: v0.11.0...v0.12.0
v0.11.0
🚀 LLM Foundry v0.11.0
New Features
LLM Foundry CLI Commands (#1337, #1345, #1348, #1354)
We've added CLI commands for our commonly used scripts.
For example, instead of calling composer llm-foundry/scripts/train.py parameters.yaml
, you can now do composer -c llm-foundry train parameters.yaml
.
Docker Images Contain All Optional Dependencies (#1431)
LLM Foundry Docker images now have all optional dependencies.
Support for Llama3 Rope Scaling (#1391)
To use it, you can add the following to your parameters:
model:
name: mpt_causal_lm
attn_config:
rope: true
...
rope_impl: hf
rope_theta: 500000
rope_hf_config:
type: llama3
...
Tokenizer Registry (#1386)
We now have a tokenizer registry so you can easily add custom tokenizers.
LoadPlanner and SavePlanner Registries (#1358)
We now have LoadPlanner and SavePlanner registries so you can easily add custom checkpoint loading and saving logic.
Faster Auto-packing (#1435)
The auto packing startup is now much faster. To use auto packing with finetuning datasets, you can add packing_ratio: auto
to your config like so:
train_loader:
name: finetuning
dataset:
...
packing_ratio: auto
What's Changed
- Extra serverless by @XiaohanZhangCMU in #1320
- Fixing sequence_id =-1 bug, adding tests by @ShashankMosaicML in #1324
- Registry docs update by @dakinggg in #1323
- Add dependabot by @dakinggg in #1322
HUGGING_FACE_HUB_TOKEN
->HF_TOKEN
by @dakinggg in #1321- Bump version by @b-chu in #1326
- Relax hf hub pin by @dakinggg in #1314
- Error if metadata matches existing keys by @dakinggg in #1313
- Update transformers requirement from <4.41,>=4.40 to >=4.42.3,<4.43 by @dependabot in #1327
- Bump einops from 0.7.0 to 0.8.0 by @dependabot in #1328
- Bump onnxruntime from 1.15.1 to 1.18.1 by @dependabot in #1329
- Bump onnx from 1.14.0 to 1.16.1 by @dependabot in #1331
- Currently multi-gpu generate does not work with hf.generate for hf checkpoints. This PR fixes that. by @ShashankMosaicML in #1332
- Fix registry for callbacks with configs by @mvpatel2000 in #1333
- Adding a child class of hf's rotary embedding to make hf generate work on multiple gpus. by @ShashankMosaicML in #1334
- Add a config arg to just save an hf checkpoint by @dakinggg in #1335
- Deepcopy config in callbacks_with_config by @mvpatel2000 in #1336
- Avoid HF race condition by @dakinggg in #1338
- Nicer error message for undefined symbol by @dakinggg in #1339
- Bump sentencepiece from 0.1.97 to 0.2.0 by @dependabot in #1342
- Removing logging exception through update run metadata by @jjanezhang in #1292
- [MCLOUD-4910] Escape UC names during data prep by @naren-loganathan in #1343
- Add CLI for train.py by @KuuCi in #1337
- Add fp32 to the set of valid inputs to attention layer by @j316chuck in #1347
- Log all extraneous_keys in one go for ease of development by @josejg in #1344
- Fix MLFlow Save Model for TE by @j316chuck in #1353
- Add flag for saving only composer checkpoint by @irenedea in #1356
- Expose flag for should_save_peft_only by @irenedea in #1357
- Command utils + train by @KuuCi in #1361
- Readd Clear Resolver by @KuuCi in #1365
- Add Eval to Foundry CLI by @KuuCi in #1345
- Enhanced Logging for convert_delta_to_json and convert_text_to_mds by @vanshcsingh in #1366
- Add convert_dataset_hf to CLI by @KuuCi in #1348
- Add missing init by @KuuCi in #1368
- Make ICL dataloaders build lazily by @josejg in #1359
- Add option to unfuse Wqkv by @snarayan21 in #1367
- Add convert_dataset_json to CLI by @KuuCi in #1349
- Add convert_text_to_mds to CLI by @KuuCi in #1352
- Fix hf dataset hang on small dataset by @dakinggg in #1370
- Add LoadPlanner and SavePlanner registries by @irenedea in #1358
- Load config on rank 0 first by @dakinggg in #1371
- Add convert_finetuning_dataset to CLI by @KuuCi in #1354
- Allow for transforms on the model before MLFlow registration by @snarayan21 in #1372
- Allow flash attention up to 3 by @dakinggg in #1377
- Update accelerate requirement from <0.26,>=0.25 to >=0.32.1,<0.33 by @dependabot in #1341
- update runners by @KevDevSha in #1360
- Allow for multiple workers when autopacking by @b-chu in #1375
- Allow train.py-like config for eval.py by @josejg in #1351
- Fix load and save planner config logic by @irenedea in #1385
- Do dtype conversion in torch hook to save memory by @irenedea in #1384
- Get a shared file system safe signal file name by @dakinggg in #1381
- Add transformation method to hf_causal_lm by @irenedea in #1383
- [kushalkodnad/tokenizer-registry] Introduce new registry for tokenizers by @kushalkodn-db in #1386
- Bump transformers version to 4.43.1 by @dakinggg in #1388
- Add convert_delta_to_json to CLI by @KuuCi in #1355
- Revert "Use utils to get shared fs safe signal file name (#1381)" by @dakinggg in #1389
- Avoid race condition in convert text to mds script by @dakinggg in #1390
- Refactor loss function for ComposerMPTCausalLM by @irenedea in #1387
- Revert "Allow for multiple workers when autopacking (#1375)" by @dakinggg in #1392
- Bump transformers to 4.43.2 by @dakinggg in #1393
- Support rope scaling by @milocress in #1391
- Removing the extra LlamaRotaryEmbedding import by @ShashankMosaicML in #1394
- Dtensor oom by @dakinggg in #1395
- Condition the meta initialization for hf_causal_lm on pretrain by @irenedea in #1397
- Fix license link in readme by @dakinggg in #1398
- Enable passing epsilon when building norm layers by @gupta-abhay in #1399
- Add pre register method for mlflow by @dakinggg in #1396
- add it by @dakinggg in #1400
- Remove orig params default by @dakinggg in #1401
- Add spin_dataloaders flag by @dakinggg in #1405
- Remove curriculum learning error when duration less than saved timestamp by @b-chu in #1406
- Set pretrained model name correctly, if provided, in HF Checkpointer by @snarayan21 in #1407
- Enable QuickGelu Function for CLIP models by @gupta-abhay in #1408
- Bump streaming version to v0.8.0 by @mvpatel2000 in #1411
- Kevin/ghcr build by @KevDevSha in #1413
- Update accelerate requirement from <0.33,>=0.25 to >=0.25,<0.34 by @dependabot in #1403
- Update huggingface-hub requirement from <0.24,>=0.19.0 to >=0.19.0,<0.25 by @dependabot in #1379
- Make Pytest log in color in Github Action by @eitanturok in https://github.com/mosaicml/llm-fo...