Add resuming from specific checkpoint #516

dreamgonfly · 2019-11-15T13:30:47Z

Before submitting

[ v ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
[ v ] Did you read the contributor guideline?
[ v ] Did you make sure to update the docs?
[ ] Did you write any new necessary tests?

What does this PR do?

Fixes #515

Did you have fun?

I solved my problem with this feature.

Borda · 2019-11-15T13:43:01Z

pytorch_lightning/trainer/trainer_io.py

+
+ checkpoint_path = Path(checkpoint_path)
+ if not checkpoint_path.exists():
+ return did_restore


You can simply return True/False no need for extra variable

Okay. I'll change the code accordingly.

I simplified the code and this function no longer exists

Borda · 2019-11-15T13:43:35Z

pytorch_lightning/trainer/trainer_io.py

@@ -93,6 +96,18 @@ def restore_state_if_checkpoint_exists(self, model):

 return did_restore

+ def restore_state_from_checkpoint(self, checkpoint_path):
+ did_restore = False


add doc what data type the checkpoint_path is

I removed this function. Please review the updated code :)

Borda · 2019-11-15T13:45:34Z

pytorch_lightning/trainer/trainer_io.py

+ did_restore = False
+
+ checkpoint_path = Path(checkpoint_path)
+ if not checkpoint_path.exists():


does not work for str as it is defined :param resume_from_checkpoint: str or os.PathLike object.

checkpoint_path is what torch.load expects as input. It can be file-like object or str containing a file name.
https://pytorch.org/docs/stable/torch.html?highlight=torch%20load#torch.load

dreamgonfly · 2019-11-18T06:19:08Z

I updated & simplified the code.

There is no longer restore_state_from_checkpoint function. Trainer simply calls restore when self.resume_from_checkpoint attribute exists.

The type of resume_from_checkpoint parameter is what torch.load expects as input. It is "a file-like object (has to implement read(), :methreadline, :methtell, and :methseek), or a string containing a file name" ( https://pytorch.org/docs/stable/torch.html?highlight=torch%20load#torch.load )

If both resume_from_checkpoint and last checkpoint in checkpoint_callback.filepath exist, Trainer restores checkpoint from resume_from_checkpoint. I chose this policy because resume_from_checkpoint is a more explicit request from user.

williamFalcon · 2019-11-30T20:18:20Z

@dreamgonfly looks great. Merging this. We need to add docs for it though

dreamgonfly added 2 commits November 15, 2019 22:09

Add resume_from_checkpoint

d0a3e25

Fix variable name

915b5c5

dreamgonfly mentioned this pull request Nov 15, 2019

Add resuming from specific checkpoint #515

Closed

Borda requested changes Nov 15, 2019

View reviewed changes

dreamgonfly added 3 commits November 18, 2019 15:00

Lightning-AI#515 Remove did_restore

0db8785

Lightning-AI#515 Simplify code

bcebf60

Lightning-AI#515 Update doc for resume_from_checkpoint

70cbd8c

Lightning-AI#515 Add on_gpu

3477108

mpariente mentioned this pull request Nov 22, 2019

Refactoring? #541

Closed

Merge branch 'master' into resume

1d3dbf6

Borda approved these changes Nov 30, 2019

View reviewed changes

williamFalcon merged commit 2b8475f into Lightning-AI:master Nov 30, 2019

neggert mentioned this pull request Dec 12, 2019

Why are hparams mandatory in the LightningModule definition? #599

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add resuming from specific checkpoint #516

Add resuming from specific checkpoint #516

dreamgonfly commented Nov 15, 2019

Borda Nov 15, 2019

dreamgonfly Nov 18, 2019

dreamgonfly Nov 18, 2019

Borda Nov 15, 2019

dreamgonfly Nov 18, 2019

Borda Nov 15, 2019

dreamgonfly Nov 18, 2019

dreamgonfly commented Nov 18, 2019 •

edited

Loading

williamFalcon commented Nov 30, 2019

Add resuming from specific checkpoint #516

Add resuming from specific checkpoint #516

Conversation

dreamgonfly commented Nov 15, 2019

Before submitting

What does this PR do?

Did you have fun?

Borda Nov 15, 2019

Choose a reason for hiding this comment

dreamgonfly Nov 18, 2019

Choose a reason for hiding this comment

dreamgonfly Nov 18, 2019

Choose a reason for hiding this comment

Borda Nov 15, 2019

Choose a reason for hiding this comment

dreamgonfly Nov 18, 2019

Choose a reason for hiding this comment

Borda Nov 15, 2019

Choose a reason for hiding this comment

dreamgonfly Nov 18, 2019

Choose a reason for hiding this comment

dreamgonfly commented Nov 18, 2019 • edited Loading

williamFalcon commented Nov 30, 2019

dreamgonfly commented Nov 18, 2019 •

edited

Loading