Highlights
Today we release v0.4.0 of torchtune with some exciting new additions! Some notable ones include full support for activation offloading, recipes for Llama3.2V 90B and QLoRA variants, new documentation, and Qwen2.5 models!
Activation offloading (#1443, #1645, #1847)
Activation offloading is a memory-saving technique that asynchronously moves checkpointed activations that are not currently running to the CPU. Right before the GPU needs the activations for the microbatch’s backward pass, this functionality prefetches the offloaded activations back from the CPU. Enabling this functionality is as easy as setting the following options in your config:
enable_activation_checkpointing: True
enable_activation_offloading: True
In experiments with Llama3 8B, activation offloading used roughly 24% less memory while inflicting a performance slowdown of under 1%.
Llama3.2V 90B with QLoRA (#1880, #1726)
We added model builders and configs for the 90B version of Llama3.2V, which outperforms the 11B version of the model across common benchmarks. Because this model size is larger, we also added the ability to run the model using QLoRA and FSDP2.
# Download the model first
tune download meta-llama/Llama-3.2-90B-Vision-Instruct --ignore-patterns "original/consolidated*"
# Run with e.g. 4 GPUs
tune run --nproc_per_node 4 lora_finetune_distributed --config llama3_2_vision/90B_qlora
Qwen2.5 model family has landed (#1863)
We added builders for Qwen2.5, the cutting-edge models from the Qwen family of models! In their own words "Compared to Qwen2, Qwen2.5 has acquired significantly more knowledge (MMLU: 85+) and has greatly improved capabilities in coding (HumanEval 85+) and mathematics (MATH 80+)."
Get started with the models easily:
tune download Qwen/Qwen2.5-1.5B-Instruct --ignore-patterns None
tune run lora_finetune_single_device --config qwen2_5/1.5B_lora_single_device
New documentation on using custom recipes, configs, and components (#1910)
We heard your feedback and wrote up a simple page on how to customize configs, recipes, and individual components! Check it out here
What's Changed
- Fix PackedDataset bug for seq_len > 2 * max_seq_len setting. by @mirceamironenco in #1697
- Bump version 0.3.1 by @joecummings in #1720
- Add error propagation to distributed run. by @mirceamironenco in #1719
- Update fusion layer counting logic for Llama 3.2 weight conversion by @ebsmothers in #1722
- Resizable image positional embeddings by @felipemello1 in #1695
- Unpin numpy by @ringohoffman in #1728
- Add HF Checkpoint Format Support for Llama Vision by @pbontrager in #1727
- config changes by @felipemello1 in #1733
- Fix custom imports for both distributed and single device by @RdoubleA in #1731
- Pin urllib3<2.0.0 to fix eleuther eval errors by @RdoubleA in #1738
- Fixing recompiles in KV-cache + compile by @SalmanMohammadi in #1663
- Fix CLIP pos embedding interpolation to work on DTensors by @ebsmothers in #1739
- Bump version to 0.4.0 by @RdoubleA in #1748
- [Feat] Activation offloading for distributed lora recipe by @Jackmin801 in #1645
- Add LR Scheduler to single device full finetune by @user074 in #1350
- Custom recipes use slash path by @RdoubleA in #1760
- Adds repr to Message by @thomasjpfan in #1757
- Fix save adapter weights only by @ebsmothers in #1764
- Set drop_last to always True by @RdoubleA in #1761
- Remove nonexistent flag for acc offloading in memory_optimizations.rst by @janeyx99 in #1772
- [BUGFIX] Adding sequence truncation to
max_seq_length
in eval recipe by @SalmanMohammadi in #1773 - Add ROCm "support" by @joecummings in #1765
- [BUG] Include system prompt in Phi3 by default by @joecummings in #1778
- Fixing quantization in eval recipe by @SalmanMohammadi in #1777
- Delete deprecated ChatDataset and InstructDataset by @joecummings in #1781
- Add split argument to required builders and set it default value to "train" by @krammnic in #1783
- Fix quantization with generate by @SalmanMohammadi in #1784
- Fix typo in multimodal_datasets.rst by @krammnic in #1787
- Make AlpacaToMessage public. by @krammnic in #1785
- Fix misleading attn_dropout docstring by @ebsmothers in #1792
- Add filter_fn to all generic dataset classes and builders API by @krammnic in #1789
- Set dropout in SDPA to 0.0 when not in training mode by @ebsmothers in #1803
- Skip entire header for llama3 decode by @RdoubleA in #1656
- Remove unused bsz variable by @zhangtemplar in #1805
- Adding
max_seq_length
to vision eval config by @SalmanMohammadi in #1802 - Add check that there is no PackedDataset while building ConcatDataset by @krammnic in #1796
- Add posibility to pack in _wikitext.py by @krammnic in #1807
- Add evaluation configs under qwen2 dir by @joecummings in #1809
- Fix eos_token problem in all required models by @krammnic in #1806
- Deprecating
TiedEmbeddingTransformerDecoder
by @SalmanMohammadi in #1815 - Torchao version check changes/BC import of TensorCoreTiledLayout by @ebsmothers in #1812
- 1810 move gemma evaluation by @malinjawi in #1819
- Consistent type checks for prepend and append tags. by @krammnic in #1824
- Move schedulers to training from modules. by @krammnic in #1801
- Update EleutherAI Eval Harness to v0.4.5 by @joecummings in #1800
- 1810 Add evaluation configs under phi3 dir by @Harthi7 in #1822
- Create CITATION.cff by @joecummings in #1756
- fixed error message for GatedRepoError by @DawiAlotaibi in #1832
- 1810 Move mistral evaluation by @Yousof-kayal in #1829
- More consistent trace names. by @krammnic in #1825
- fbcode using TensorCoreLayout by @jerryzh168 in #1834
- Remove pad_max_tiles in CLIP by @pbontrager in #1836
- Remove pad_max_tiles in CLIP inference by @lucylq in #1853
- Add
vqa_dataset
, update docs by @krammnic in #1820 - Add offloading tests and fix obscure edge case by @janeyx99 in #1860
- Toggling KV-caches by @SalmanMohammadi in #1763
- Cacheing doc nits by @SalmanMohammadi in #1876
- LoRA typo fix + bias=True by @felipemello1 in #1881
- Correct
torchao
check forTensorCoreTiledLayout
by @joecummings in #1886 - Kd_loss avg over tokens by @moussaKam in #1885
- Support Optimizer-in-the-backward by @mori360 in #1833
- Remove deprecated
GemmaTransformerDecoder
by @SalmanMohammadi in #1892 - Add PromptTemplate examples by @SalmanMohammadi in #1891
- Temporarily disable building Python 3.13 version of torchtune by @joecummings in #1896
- Block on Python 3.13 version by @joecummings in #1898
- [bug] fix sharding multimodal by @felipemello1 in #1889
- QLoRA with bias + Llama 3.2 Vision QLoRA configs by @ebsmothers in #1726
- Block on Python 3.13 version by @joecummings in #1899
- Normalize CE loss by total number of (non-padding) tokens by @ebsmothers in #1875
- nit: remove (nightly) in recipes by @krammnic in #1882
- Expose packed: False, set log_peak_memory_stats: True, set compile: False by @krammnic in #1872
- Remove ChatFormat, InstructTemplate, old message converters by @RdoubleA in #1895
- Make TensorCoreTiledLayout import more robust by @andrewor14 in #1912
- [ez] Fix README download example by @RdoubleA in #1915
- [docs] Custom components page by @RdoubleA in #1910
- Update imports after QAT was moved out of prototype by @andrewor14 in #1883
- Updating memory optimization overview by @SalmanMohammadi in #1916
- Patch github link in torchtune docs header by @ebsmothers in #1914
- Llama 3.2 Vision - 90B by @felipemello1 in #1880
- Fixing DoRA docs, adding to mem opt tutorial by @SalmanMohammadi in #1918
- Add KD distributed recipe by @lindawangg in #1631
- add missing doc by @felipemello1 in #1924
- [FIX] MM Eval Mask Sizes by @pbontrager in #1920
- Activation offloading for fullfinetuning + fix tied embedding by @felipemello1 in #1847
- Qwen2.5 by @calvinpelletier in #1863
- Restore backward after each batch for grad accum by @ebsmothers in #1917
- Fix lora single device fine tune checkpoint saving & nan loss when use_dora=True by @mirceamironenco in #1909
New Contributors
- @ringohoffman made their first contribution in #1728
- @Jackmin801 made their first contribution in #1645
- @user074 made their first contribution in #1350
- @krammnic made their first contribution in #1783
- @zhangtemplar made their first contribution in #1805
- @malinjawi made their first contribution in #1819
- @Harthi7 made their first contribution in #1822
- @DawiAlotaibi made their first contribution in #1832
- @Yousof-kayal made their first contribution in #1829
- @moussaKam made their first contribution in #1885
- @mori360 made their first contribution in #1833
Full Changelog: v0.3.1...v0.4.0