Release v0.4.0 · pytorch/torchtune

Highlights

Today we release v0.4.0 of torchtune with some exciting new additions! Some notable ones include full support for activation offloading, recipes for Llama3.2V 90B and QLoRA variants, new documentation, and Qwen2.5 models!

Activation offloading (#1443, #1645, #1847)

Activation offloading is a memory-saving technique that asynchronously moves checkpointed activations that are not currently running to the CPU. Right before the GPU needs the activations for the microbatch’s backward pass, this functionality prefetches the offloaded activations back from the CPU. Enabling this functionality is as easy as setting the following options in your config:

enable_activation_checkpointing: True
enable_activation_offloading: True

In experiments with Llama3 8B, activation offloading used roughly 24% less memory while inflicting a performance slowdown of under 1%.

Llama3.2V 90B with QLoRA (#1880, #1726)

We added model builders and configs for the 90B version of Llama3.2V, which outperforms the 11B version of the model across common benchmarks. Because this model size is larger, we also added the ability to run the model using QLoRA and FSDP2.

# Download the model first
tune download meta-llama/Llama-3.2-90B-Vision-Instruct --ignore-patterns "original/consolidated*"
# Run with e.g. 4 GPUs
tune run --nproc_per_node 4 lora_finetune_distributed --config llama3_2_vision/90B_qlora

Qwen2.5 model family has landed (#1863)

We added builders for Qwen2.5, the cutting-edge models from the Qwen family of models! In their own words "Compared to Qwen2, Qwen2.5 has acquired significantly more knowledge (MMLU: 85+) and has greatly improved capabilities in coding (HumanEval 85+) and mathematics (MATH 80+)."

Get started with the models easily:

tune download Qwen/Qwen2.5-1.5B-Instruct --ignore-patterns None
tune run lora_finetune_single_device --config qwen2_5/1.5B_lora_single_device

New documentation on using custom recipes, configs, and components (#1910)

We heard your feedback and wrote up a simple page on how to customize configs, recipes, and individual components! Check it out here

What's Changed

Fix PackedDataset bug for seq_len > 2 * max_seq_len setting. by @mirceamironenco in #1697
Bump version 0.3.1 by @joecummings in #1720
Add error propagation to distributed run. by @mirceamironenco in #1719
Update fusion layer counting logic for Llama 3.2 weight conversion by @ebsmothers in #1722
Resizable image positional embeddings by @felipemello1 in #1695
Unpin numpy by @ringohoffman in #1728
Add HF Checkpoint Format Support for Llama Vision by @pbontrager in #1727
config changes by @felipemello1 in #1733
Fix custom imports for both distributed and single device by @RdoubleA in #1731
Pin urllib3<2.0.0 to fix eleuther eval errors by @RdoubleA in #1738
Fixing recompiles in KV-cache + compile by @SalmanMohammadi in #1663
Fix CLIP pos embedding interpolation to work on DTensors by @ebsmothers in #1739
Bump version to 0.4.0 by @RdoubleA in #1748
[Feat] Activation offloading for distributed lora recipe by @Jackmin801 in #1645
Add LR Scheduler to single device full finetune by @user074 in #1350
Custom recipes use slash path by @RdoubleA in #1760
Adds repr to Message by @thomasjpfan in #1757
Fix save adapter weights only by @ebsmothers in #1764
Set drop_last to always True by @RdoubleA in #1761
Remove nonexistent flag for acc offloading in memory_optimizations.rst by @janeyx99 in #1772
[BUGFIX] Adding sequence truncation to max_seq_length in eval recipe by @SalmanMohammadi in #1773
Add ROCm "support" by @joecummings in #1765
[BUG] Include system prompt in Phi3 by default by @joecummings in #1778
Fixing quantization in eval recipe by @SalmanMohammadi in #1777
Delete deprecated ChatDataset and InstructDataset by @joecummings in #1781
Add split argument to required builders and set it default value to "train" by @krammnic in #1783
Fix quantization with generate by @SalmanMohammadi in #1784
Fix typo in multimodal_datasets.rst by @krammnic in #1787
Make AlpacaToMessage public. by @krammnic in #1785
Fix misleading attn_dropout docstring by @ebsmothers in #1792
Add filter_fn to all generic dataset classes and builders API by @krammnic in #1789
Set dropout in SDPA to 0.0 when not in training mode by @ebsmothers in #1803
Skip entire header for llama3 decode by @RdoubleA in #1656
Remove unused bsz variable by @zhangtemplar in #1805
Adding max_seq_length to vision eval config by @SalmanMohammadi in #1802
Add check that there is no PackedDataset while building ConcatDataset by @krammnic in #1796
Add posibility to pack in _wikitext.py by @krammnic in #1807
Add evaluation configs under qwen2 dir by @joecummings in #1809
Fix eos_token problem in all required models by @krammnic in #1806
Deprecating TiedEmbeddingTransformerDecoder by @SalmanMohammadi in #1815
Torchao version check changes/BC import of TensorCoreTiledLayout by @ebsmothers in #1812
1810 move gemma evaluation by @malinjawi in #1819
Consistent type checks for prepend and append tags. by @krammnic in #1824
Move schedulers to training from modules. by @krammnic in #1801
Update EleutherAI Eval Harness to v0.4.5 by @joecummings in #1800
1810 Add evaluation configs under phi3 dir by @Harthi7 in #1822
Create CITATION.cff by @joecummings in #1756
fixed error message for GatedRepoError by @DawiAlotaibi in #1832
1810 Move mistral evaluation by @Yousof-kayal in #1829
More consistent trace names. by @krammnic in #1825
fbcode using TensorCoreLayout by @jerryzh168 in #1834
Remove pad_max_tiles in CLIP by @pbontrager in #1836
Remove pad_max_tiles in CLIP inference by @lucylq in #1853
Add vqa_dataset, update docs by @krammnic in #1820
Add offloading tests and fix obscure edge case by @janeyx99 in #1860
Toggling KV-caches by @SalmanMohammadi in #1763
Cacheing doc nits by @SalmanMohammadi in #1876
LoRA typo fix + bias=True by @felipemello1 in #1881
Correct torchao check for TensorCoreTiledLayout by @joecummings in #1886
Kd_loss avg over tokens by @moussaKam in #1885
Support Optimizer-in-the-backward by @mori360 in #1833
Remove deprecated GemmaTransformerDecoder by @SalmanMohammadi in #1892
Add PromptTemplate examples by @SalmanMohammadi in #1891
Temporarily disable building Python 3.13 version of torchtune by @joecummings in #1896
Block on Python 3.13 version by @joecummings in #1898
[bug] fix sharding multimodal by @felipemello1 in #1889
QLoRA with bias + Llama 3.2 Vision QLoRA configs by @ebsmothers in #1726
Block on Python 3.13 version by @joecummings in #1899
Normalize CE loss by total number of (non-padding) tokens by @ebsmothers in #1875
nit: remove (nightly) in recipes by @krammnic in #1882
Expose packed: False, set log_peak_memory_stats: True, set compile: False by @krammnic in #1872
Remove ChatFormat, InstructTemplate, old message converters by @RdoubleA in #1895
Make TensorCoreTiledLayout import more robust by @andrewor14 in #1912
[ez] Fix README download example by @RdoubleA in #1915
[docs] Custom components page by @RdoubleA in #1910
Update imports after QAT was moved out of prototype by @andrewor14 in #1883
Updating memory optimization overview by @SalmanMohammadi in #1916
Patch github link in torchtune docs header by @ebsmothers in #1914
Llama 3.2 Vision - 90B by @felipemello1 in #1880
Fixing DoRA docs, adding to mem opt tutorial by @SalmanMohammadi in #1918
Add KD distributed recipe by @lindawangg in #1631
add missing doc by @felipemello1 in #1924
[FIX] MM Eval Mask Sizes by @pbontrager in #1920
Activation offloading for fullfinetuning + fix tied embedding by @felipemello1 in #1847
Qwen2.5 by @calvinpelletier in #1863
Restore backward after each batch for grad accum by @ebsmothers in #1917
Fix lora single device fine tune checkpoint saving & nan loss when use_dora=True by @mirceamironenco in #1909

New Contributors

@ringohoffman made their first contribution in #1728
@Jackmin801 made their first contribution in #1645
@user074 made their first contribution in #1350
@krammnic made their first contribution in #1783
@zhangtemplar made their first contribution in #1805
@malinjawi made their first contribution in #1819
@Harthi7 made their first contribution in #1822
@DawiAlotaibi made their first contribution in #1832
@Yousof-kayal made their first contribution in #1829
@moussaKam made their first contribution in #1885
@mori360 made their first contribution in #1833

Full Changelog: v0.3.1...v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0