Using `FormattedCheckpointFiles` in configs #2147

SalmanMohammadi · 2024-12-10T21:36:55Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Please link to any issues this PR addresses.

closes #2073

A super duper sanity check since we didn't have an example of the pytorch_model-{}-of-{}.bin format

from torchtune.training.checkpointing._utils import FormattedCheckpointFiles
fmt = FormattedCheckpointFiles(filename_format="pytorch_model-{}-of-{}.bin", max_filename="00015")
fmt.build_checkpoint_filenames()
['pytorch_model-00001-of-00015.bin',
 'pytorch_model-00002-of-00015.bin',
 'pytorch_model-00003-of-00015.bin',
 'pytorch_model-00004-of-00015.bin',
 'pytorch_model-00005-of-00015.bin',
 'pytorch_model-00006-of-00015.bin',
 'pytorch_model-00007-of-00015.bin',
 'pytorch_model-00008-of-00015.bin',
 'pytorch_model-00009-of-00015.bin',
 'pytorch_model-00010-of-00015.bin',
 'pytorch_model-00011-of-00015.bin',
 'pytorch_model-00012-of-00015.bin',
 'pytorch_model-00013-of-00015.bin',
 'pytorch_model-00014-of-00015.bin',
 'pytorch_model-00015-of-00015.bin']

pytorch-bot · 2024-12-10T21:36:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2147

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 921ccde with merge base 5370e0d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

felipemello1

thanks for doing it! I looked at it twice and everything looks good. But we are so close to a release that i am afraid that there may be some error if we dont test it. We can either: I will approve it, but assuming that you didnt try these configs, i am not sure if it should make it to this release

SalmanMohammadi · 2024-12-11T04:44:50Z

thanks for doing it! I looked at it twice and everything looks good. But we are so close to a release that i am afraid that there may be some error if we dont test it. We can either: I will approve it, but assuming that you didnt try these configs, i am not sure if it should make it to this release

As you wish. I thought about testing it but it's a lot of effort for a minor change.

joecummings · 2024-12-11T10:11:06Z

Should merge this after the release branch is cut @felipemello1

* Llama 3.3 70B (pytorch#2124) * Llama 3.3 readme updates (pytorch#2125) * update configs (pytorch#2107) Co-authored-by: Felipe Mello <felipemello@fb.com> * Reduce logging output for distributed KD (pytorch#2120) * Support Early Exit Loss and/or Layer Dropout (pytorch#1076) Co-authored-by: ebsmothers <ebs@meta.com> * Update checkpointing directory (pytorch#2074) Co-authored-by: Felipe Mello <felipemello@fb.com> Co-authored-by: vancoyendall <vancoykendall@gmail.com> * pass correct arg (pytorch#2127) Co-authored-by: Felipe Mello <felipemello@fb.com> * update configs (pytorch#2128) Co-authored-by: Felipe Mello <felipemello@fb.com> * fix qat_lora_test (pytorch#2131) Co-authored-by: Felipe Mello <felipemello@fb.com> * guard ckpt imports (pytorch#2133) Co-authored-by: Felipe Mello <felipemello@fb.com> * [bug fix] add parents=True (pytorch#2136) Co-authored-by: Felipe Mello <felipemello@fb.com> * [bug fix] re-add model (pytorch#2135) Co-authored-by: Felipe Mello <felipemello@fb.com> * Update save sizes into GiB (pytorch#2143) * [bug fix] remove config download when source is kaggle (pytorch#2144) Co-authored-by: Felipe Mello <felipemello@fb.com> * [fix] remove "with_suffix" (pytorch#2146) Co-authored-by: Felipe Mello <felipemello@fb.com> * DoRA fixes (pytorch#2139) Co-authored-by: Mircea Mironenco <5738815+mirceamironenco@users.noreply.github.com> * [Fix] Llama 3.2 Vision decoder_trainable flag fixed (pytorch#2150) * Small readme, config updates (pytorch#2157) * Using `FormattedCheckpointFiles` in configs (pytorch#2147) * Move ``get_world_size_and_rank`` to utils (pytorch#2155) * Faster intermediate checkpoints with DCP async save in TorchTune (pytorch#2006) Co-authored-by: Saurabh Mishra <msaurabh@fb.com> * torchdata integration - multi-dataset and streaming support (pytorch#1929) * Allow higher version of lm-eval (pytorch#2165) * Using `FormattedCheckpointFiles` in configs... round 2 (pytorch#2167) * [EZ] Fix set_torch_num_threads in multi-node. (pytorch#2164) --------- Co-authored-by: Philip Bontrager <pbontrager@gmail.com> Co-authored-by: ebsmothers <ebs@meta.com> Co-authored-by: Felipe Mello <fmellomascarenhas@gmail.com> Co-authored-by: Felipe Mello <felipemello@fb.com> Co-authored-by: Joe Cummings <jrcummings27@gmail.com> Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org> Co-authored-by: vancoyendall <vancoykendall@gmail.com> Co-authored-by: Mircea Mironenco <5738815+mirceamironenco@users.noreply.github.com> Co-authored-by: salman <salman.mohammadi@outlook.com> Co-authored-by: Saurabh Mishra <msaurabh@meta.com> Co-authored-by: Saurabh Mishra <msaurabh@fb.com> Co-authored-by: Andrew Ho <andrew.kenneth.ho@gmail.com> Co-authored-by: Eugen Hotaj <eugen_hotaj_91@hotmail.com>

SalmanMohammadi added 2 commits December 10, 2024 21:26

updating configs

b061aba

woopsiedaisy

921ccde

SalmanMohammadi requested review from joecummings and felipemello1 December 10, 2024 21:36

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 10, 2024

felipemello1 approved these changes Dec 11, 2024

View reviewed changes

joecummings merged commit cdaece1 into pytorch:main Dec 13, 2024
17 checks passed

SalmanMohammadi deleted the 2073 branch December 16, 2024 12:01

rahul-sarvam pushed a commit to sarvamai/torchtune that referenced this pull request Dec 23, 2024

Using FormattedCheckpointFiles in configs (pytorch#2147)

7350af6

rahul-sarvam pushed a commit to sarvamai/torchtune that referenced this pull request Dec 23, 2024

Using FormattedCheckpointFiles in configs (pytorch#2147)

a6f1cd2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using `FormattedCheckpointFiles` in configs #2147

Using `FormattedCheckpointFiles` in configs #2147

SalmanMohammadi commented Dec 10, 2024

pytorch-bot bot commented Dec 10, 2024 •

edited

Loading

felipemello1 left a comment •

edited

Loading

SalmanMohammadi commented Dec 11, 2024

joecummings commented Dec 11, 2024

Using FormattedCheckpointFiles in configs #2147

Using FormattedCheckpointFiles in configs #2147

Conversation

SalmanMohammadi commented Dec 10, 2024

Context

pytorch-bot bot commented Dec 10, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2147

✅ No Failures

felipemello1 left a comment • edited Loading

Choose a reason for hiding this comment

SalmanMohammadi commented Dec 11, 2024

joecummings commented Dec 11, 2024

Using `FormattedCheckpointFiles` in configs #2147

Using `FormattedCheckpointFiles` in configs #2147

pytorch-bot bot commented Dec 10, 2024 •

edited

Loading

felipemello1 left a comment •

edited

Loading