Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing configurable checkpointing. #263

Merged
merged 4 commits into from
Oct 30, 2023
Merged

Conversation

erwulff
Copy link
Collaborator

@erwulff erwulff commented Oct 29, 2023

  • Checkpoint the PyTorch training at regular frequencies given by checkpoint_freq, specified in the config or on the command line with --checkpoint-freq.
  • Progress messages now show train or valid instead of train=True or train=False.
  • Separated out saving and loading of checkpoints into functions, saving the model, the optimizer (optional) and a dict named extra_state that can contain any additional information (optional). extra_state currently only contains the epoch number when the checkpoint was saved.

erwulff and others added 4 commits October 29, 2023 13:43
- save checkpoints of model, optimizer and a
  dict containing extra info
- progress bar now displays correct epoch number
- refactor saving and loading checkpoints into
  functions
@jpata jpata merged commit 4b00985 into jpata:main Oct 30, 2023
10 checks passed
farakiko pushed a commit to farakiko/particleflow that referenced this pull request Jan 23, 2024
* feat: checkpoint at a given frequency

- save checkpoints of model, optimizer and a
  dict containing extra info
- progress bar now displays correct epoch number
- refactor saving and loading checkpoints into
  functions

* update flatiron scripts

* add authors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants