Enable proper optimizer state storing + Test between batches #3053

aporialiao · 2025-06-06T19:04:32Z

Summary:

Main Changes

Enable unit test with an adaptive optimizer Adagrad
1. Previously I tested the optimizer state with an optimizer SGD that is static throughout training so didn't , instead here I used the Adagrad which exposed the previous implementation did not properly store optimziers.
Properly store optimizer state in update_optimizer_state
2. Append optimizer tensors as inputs to the all2all call, then parse through the output tensors to store the right tensors.
2. Optimizer tensors that did not need to be sent to a new rank are persisted and resaved.
2. After new lookups are created, use load_state_dict to load in the saved optimizer state to the current optimizers.
Helpers & other small changes
3. Helper to compare optimizer tensors for unit tests
3. Update DMP reshard - optimizer saving to match the same fqn

Differential Revision: D75565054

facebook-github-bot · 2025-06-06T19:04:40Z

This pull request was exported from Phabricator. Differential Revision: D75565054

…torch#3053) Summary: # Main Changes 1. Enable unit test with an adaptive optimizer `Adagrad` 1. Previously I tested the optimizer state with an optimizer `SGD` that is static throughout training so didn't , instead here I used the `Adagrad` which exposed the previous implementation did not properly store optimziers. 2. Properly store optimizer state in `update_optimizer_state` 2. Append optimizer tensors as inputs to the all2all call, then parse through the output tensors to store the right tensors. 2. Optimizer tensors that did not need to be sent to a new rank are persisted and resaved. 2. After new lookups are created, use `load_state_dict` to load in the saved optimizer state to the current optimizers. 3. Helpers & other small changes 3. Helper to compare optimizer tensors for unit tests 3. Update `DMP` reshard - optimizer saving to match the same fqn Differential Revision: D75565054

facebook-github-bot · 2025-06-06T19:16:58Z

This pull request was exported from Phabricator. Differential Revision: D75565054

…torch#3053) Summary: # Main Changes 1. Enable unit test with an adaptive optimizer `Adagrad` 1. Previously I tested the optimizer state with an optimizer `SGD` that is static throughout training so didn't actually test if we stored opt state, instead here I used the `Adagrad` which exposed the previous implementation did not properly store optimziers. 2. Properly store optimizer state in `update_optimizer_state` 2. Append optimizer tensors as inputs to the all2all call, then parse through the output tensors to store the right tensors. 2. Optimizer tensors that did not need to be sent to a new rank are persisted and resaved. 2. After new lookups are created, use `load_state_dict` to load in the saved optimizer state to the current optimizers. 3. Helpers & other small changes 3. Helper to compare optimizer tensors for unit tests 3. Update `DMP` reshard - optimizer saving to match the same fqn Reviewed By: aliafzal Differential Revision: D75565054

facebook-github-bot · 2025-06-06T21:22:14Z

This pull request was exported from Phabricator. Differential Revision: D75565054

…torch#3053) Summary: # Main Changes 1. Enable unit test with an adaptive optimizer `Adagrad` 1. Previously I tested the optimizer state with an optimizer `SGD` that is static throughout training so didn't actually test if we stored opt state, instead here I used the `Adagrad` which exposed the previous implementation did not properly store optimziers. 2. Properly store optimizer state in `update_optimizer_state` 2. Append optimizer tensors as inputs to the all2all call, then parse through the output tensors to store the right tensors. 2. Optimizer tensors that did not need to be sent to a new rank are persisted and resaved. 2. After new lookups are created, use `load_state_dict` to load in the saved optimizer state to the current optimizers. 3. Helpers & other small changes 3. Helper to compare optimizer tensors for unit tests 3. Update `DMP` reshard - optimizer saving to match the same fqn Reviewed By: aliafzal Differential Revision: D75565054

facebook-github-bot · 2025-06-06T22:59:12Z

This pull request was exported from Phabricator. Differential Revision: D75565054

…torch#3053) Summary: Pull Request resolved: meta-pytorch#3053 # Main Changes 1. Enable unit test with an adaptive optimizer `Adagrad` 1. Previously I tested the optimizer state with an optimizer `SGD` that is static throughout training so didn't actually test if we stored opt state, instead here I used the `Adagrad` which exposed the previous implementation did not properly store optimziers. 2. Properly store optimizer state in `update_optimizer_state` 2. Append optimizer tensors as inputs to the all2all call, then parse through the output tensors to store the right tensors. 2. Optimizer tensors that did not need to be sent to a new rank are persisted and resaved. 2. After new lookups are created, use `load_state_dict` to load in the saved optimizer state to the current optimizers. 3. Helpers & other small changes 3. Helper to compare optimizer tensors for unit tests 3. Update `DMP` reshard - optimizer saving to match the same fqn Reviewed By: aliafzal Differential Revision: D75565054

facebook-github-bot · 2025-06-06T22:59:20Z

This pull request was exported from Phabricator. Differential Revision: D75565054

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 6, 2025

facebook-github-bot added the fb-exported label Jun 6, 2025

aporialiao force-pushed the export-D75565054 branch from 6822ac4 to afb8577 Compare June 6, 2025 19:16

aporialiao force-pushed the export-D75565054 branch from afb8577 to fd4f611 Compare June 6, 2025 21:22

aporialiao force-pushed the export-D75565054 branch from fd4f611 to 4a7c601 Compare June 6, 2025 22:55

aporialiao force-pushed the export-D75565054 branch from 4a7c601 to f706b80 Compare June 6, 2025 22:59

facebook-github-bot closed this in bfbd95d Jun 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable proper optimizer state storing + Test between batches #3053

Enable proper optimizer state storing + Test between batches #3053

Uh oh!

aporialiao commented Jun 6, 2025

Uh oh!

facebook-github-bot commented Jun 6, 2025

Uh oh!

facebook-github-bot commented Jun 6, 2025

Uh oh!

facebook-github-bot commented Jun 6, 2025

Uh oh!

facebook-github-bot commented Jun 6, 2025

Uh oh!

facebook-github-bot commented Jun 6, 2025

Uh oh!

Uh oh!

Enable proper optimizer state storing + Test between batches #3053

Enable proper optimizer state storing + Test between batches #3053

Uh oh!

Conversation

aporialiao commented Jun 6, 2025

Main Changes

Uh oh!

facebook-github-bot commented Jun 6, 2025

Uh oh!

facebook-github-bot commented Jun 6, 2025

Uh oh!

facebook-github-bot commented Jun 6, 2025

Uh oh!

facebook-github-bot commented Jun 6, 2025

Uh oh!

facebook-github-bot commented Jun 6, 2025

Uh oh!

Uh oh!