Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pytorch checkpointing for CL callback #1581

Merged
merged 1 commit into from
Oct 10, 2024
Merged

Fix pytorch checkpointing for CL callback #1581

merged 1 commit into from
Oct 10, 2024

Conversation

b-chu
Copy link
Contributor

@b-chu b-chu commented Oct 10, 2024

Adds a dataclass to store the state for the CL callback. This PR is similar to the fix here.

Context:
With the 2.4 upgrade for DCP on pytorch, it flattens all state dict elements which are instances of typing. Mapping or lists before saving. However, during loading, we are expected either Mapping / lists, instead of flattened elements for the runs.

Save: https://dbc-559ffd80-2bfc.cloud.databricks.com/ml/experiments/2834569508095648/runs/c5989c4fec75413595c029e70240cae5?o=7395834863327820
Load: https://dbc-559ffd80-2bfc.cloud.databricks.com/ml/experiments/2834569508095648/runs/91dd369dc38b4576b44e6c0059825c2b?o=7395834863327820

Copy link
Contributor

@j316chuck j316chuck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test run in the PR description to make things work? Other than that this looks good to me ✅

Copy link
Contributor

@bigning bigning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please all add an save/load test

@b-chu
Copy link
Contributor Author

b-chu commented Oct 10, 2024

Save and load run added

@b-chu b-chu merged commit 1654827 into main Oct 10, 2024
9 checks passed
b-chu added a commit that referenced this pull request Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants