Remove lr_scheduler requirement in lora_dpo_single_device #1991

thomasjpfan · 2024-11-12T19:31:57Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Towards #1751

For testing, I initially wanted to monkeypatch the recipes module, but it was not designed to be imported from tests:

torchtune/recipes/__init__.py

Lines 7 to 8 in 912af64

    
           # This file mainly exists because we want to ensure that `recipes` aren't 
        
           # importable *from the tests*.

. So for this PR, I added a new integration test that makes a new config without a scheduling and run tune run for one epoch.

Changelog

What are the changes made in this PR?

Removes lr_scheduler requirement in lora_dpo_single_device

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

I did not change any public API
I have added an example to docs or docstrings

pytorch-bot · 2024-11-12T19:32:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1991

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

GLIBC not found in Nova workflows

This comment was automatically generated by Dr. CI and updates every 15 minutes.

RdoubleA

Thanks for making this update, had a few comments on implementation

RdoubleA · 2024-11-13T14:20:55Z

recipes/lora_dpo_single_device.py

@@ -243,7 +243,7 @@ def setup(self, cfg: DictConfig) -> None:
        # Learning rate scheduler can only be set up after number of steps
        # has been computed
        self._lr_scheduler = self._setup_lr_scheduler(
-            cfg_lr_scheduler=cfg.lr_scheduler,
+            cfg_lr_scheduler=cfg.get("lr_scheduler", None),


may be less indirection to just check directly if lr_scheduler exists and set to None here, instead of calling the setup method only to return None:

cfg_lr_scheduler = cfg.get("lr_scheduler", None) self._lr_scheduler = self._setup_lr_scheduler(...) if cfg_lr_scheduler is not None else None

That makes sense to me, Rafi, but upon second thought, i like the idea of handling everything inside of the setup_lr_scheduler, including the log_info. What do you think?

Sure, either works, no strong opinions

RdoubleA · 2024-11-13T14:22:59Z

tests/recipes/test_lora_dpo_single_device.py

@@ -191,3 +191,49 @@ def test_save_and_load_merged_weights(self, tmpdir, monkeypatch):
        llama2_model.load_state_dict(sd)
        merged_ckpt_out = llama2_model(inputs)
        torch.testing.assert_close(baseline_out, merged_ckpt_out, rtol=1e-5, atol=1e-5)
+
+    @pytest.mark.integration_test
+    def test_lr_scheduler_optional(self, tmpdir, monkeypatch):


this is a great test but it seems expensive just to check that the lr_scheduler is not created. Maybe you could just call the recipe's setup() method with a toy model and check that self._lr_scheduler == None?

I initially wanted to go with this approach, but I do not think we are allowed to import anything in recipes (i.e. LoRADPORecipeSingleDevice) from tests. Specifically, running from recipes.lora_dpo_single_device import LoRADPORecipeSingleDevice raises this error:

torchtune/recipes/__init__.py

Lines 19 to 23 in 912af64

raise ModuleNotFoundError(

"The torchtune recipes directory isn't a package and you should not import anything from here. "

"Refer to our docs for detailed instructions on how to use recipes: "

"https://pytorch.org/torchtune/main/deep_dives/recipe_deepdive.html"

)

I am ok with not adding a test for it. Any thoughts @ebsmothers ?

yeah, let's just remove it

felipemello1 · 2024-11-13T19:24:47Z

Thanks for the pr, @thomasjpfan! I left a couple of comments. After Rafi and Evan reply, i am good with merging it. PS: after the changes for lora_dpo_single_device are done, would you be willing to add it to the other recipes too? Its ok if the answer is no. We can do it as a follow up :)

thomasjpfan · 2024-11-14T14:39:35Z

PS: after the changes for lora_dpo_single_device are done, would you be willing to add it to the other recipes too? Its ok if the answer is no. We can do it as a follow up :)

I'll be happy to add it to the other recipes as a followup. I started off with one recipe to see how much test converge is required.

RdoubleA · 2024-11-14T14:52:00Z

@thomasjpfan let's go ahead and remove the test and this is good to go!

…duler_optional

felipemello1 · 2024-11-14T15:28:07Z

Just before we approve it, do you mind running it with wandb logger (or any other logger) and capturing the learning rate with/without the scheduler, as a sanity check?

You can append this to your cli command:

tune run <...> \
metric_logger=torchtune.training.metric_logging.WandBLogger \
max_steps_per_epoch=10 \
epochs=1 \
lr_scheduler._component_=torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
lr_scheduler.num_warmup_steps=5

tune run <...> \
metric_logger=torchtune.training.metric_logging.WandBLogger \
max_steps_per_epoch=10 \
epochs=1 \
~lr_scheduler._component_
~lr_scheduler.num_warmup_steps

Remove lr_scheduler requirement in lora_dpo_single_device

ceeb2b5

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 12, 2024

RdoubleA reviewed Nov 13, 2024

View reviewed changes

thomasjpfan added 2 commits November 14, 2024 15:09

Remove lr_scheduler test

9ed1f8f

Merge remote-tracking branch 'upstream/main' into thomasjpfan/lr_sche…

edc3b8b

…duler_optional

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove lr_scheduler requirement in lora_dpo_single_device #1991

Remove lr_scheduler requirement in lora_dpo_single_device #1991

thomasjpfan commented Nov 12, 2024 •

edited

Loading

pytorch-bot bot commented Nov 12, 2024 •

edited

Loading

RdoubleA left a comment

RdoubleA Nov 13, 2024

felipemello1 Nov 13, 2024

RdoubleA Nov 14, 2024 •

edited

Loading

RdoubleA Nov 13, 2024

thomasjpfan Nov 13, 2024 •

edited

Loading

felipemello1 Nov 13, 2024

RdoubleA Nov 14, 2024

felipemello1 commented Nov 13, 2024

thomasjpfan commented Nov 14, 2024

RdoubleA commented Nov 14, 2024

felipemello1 commented Nov 14, 2024 •

edited

Loading

	# This file mainly exists because we want to ensure that `recipes` aren't
	# importable from the tests.

	raise ModuleNotFoundError(
	"The torchtune recipes directory isn't a package and you should not import anything from here. "
	"Refer to our docs for detailed instructions on how to use recipes: "
	"https://pytorch.org/torchtune/main/deep_dives/recipe_deepdive.html"
	)

Remove lr_scheduler requirement in lora_dpo_single_device #1991

Are you sure you want to change the base?

Remove lr_scheduler requirement in lora_dpo_single_device #1991

Conversation

thomasjpfan commented Nov 12, 2024 • edited Loading

Context

Changelog

Test plan

UX

pytorch-bot bot commented Nov 12, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1991

❗ 1 Active SEVs

RdoubleA left a comment

Choose a reason for hiding this comment

RdoubleA Nov 13, 2024

Choose a reason for hiding this comment

felipemello1 Nov 13, 2024

Choose a reason for hiding this comment

RdoubleA Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

RdoubleA Nov 13, 2024

Choose a reason for hiding this comment

thomasjpfan Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

felipemello1 Nov 13, 2024

Choose a reason for hiding this comment

RdoubleA Nov 14, 2024

Choose a reason for hiding this comment

felipemello1 commented Nov 13, 2024

thomasjpfan commented Nov 14, 2024

RdoubleA commented Nov 14, 2024

felipemello1 commented Nov 14, 2024 • edited Loading

thomasjpfan commented Nov 12, 2024 •

edited

Loading

pytorch-bot bot commented Nov 12, 2024 •

edited

Loading

RdoubleA Nov 14, 2024 •

edited

Loading

thomasjpfan Nov 13, 2024 •

edited

Loading

felipemello1 commented Nov 14, 2024 •

edited

Loading