Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured results (train loop only. val loop separate PR) (PR 2/5) #2615

Merged
merged 174 commits into from
Jul 20, 2020

Conversation

williamFalcon
Copy link
Contributor

@williamFalcon williamFalcon commented Jul 15, 2020

Adds support for cleaner returns between steps.

Expressive syntax

Before

def training_step(...):
    return {'loss', ..., 'log': {'a': 1, 'b': 1}, 'progress_bar': {'a': 1}}

Now:

def training_step():

    result = TrainResult(minimize=some_value)
    result.log('a', 1, prog_bar=True)
    result.log('b', 1)

    return result

No need for multiple reduction steps

before

def training_step():
    return {'a': 1}

def training_epoch_end(self, outputs):
    avg = torch.tensor([x['a'] for x in outputs]).mean()
    return {'val_loss': avg}

Now

def training_step():
    result = TrainResult(early_stop_on=loss)
    result.log('a', 1, on_epoch=True, reduce_fx=torch.mean)

    return result

@pep8speaks
Copy link

pep8speaks commented Jul 15, 2020

Hello @williamFalcon! Thanks for updating this PR.

Line 114:9: E121 continuation line under-indented for hanging indent
Line 114:9: E125 continuation line with same indent as next logical line

Line 592:120: E501 line too long (138 > 119 characters)

Comment last updated at 2020-07-20 22:42:50 UTC

@mergify mergify bot requested a review from a team July 15, 2020 11:57
@codecov
Copy link

codecov bot commented Jul 15, 2020

Codecov Report

Merging #2615 into master will decrease coverage by 0%.
The diff coverage is 85%.

@@           Coverage Diff           @@
##           master   #2615    +/-   ##
=======================================
- Coverage      91%     91%    -0%     
=======================================
  Files          70      71     +1     
  Lines        5778    5918   +140     
=======================================
+ Hits         5270    5388   +118     
- Misses        508     530    +22     

@jeremyjordan
Copy link
Contributor

Personally, I would prefer that the early_stop_on and checkpoint_on are configured via the callback initializations. That way, the callback fully encapsulates all of the concerns for the behavior.

@williamFalcon
Copy link
Contributor Author

williamFalcon commented Jul 17, 2020

the advantage of doing it this way is that you have finegrain control over what to early step or checkpoint on (not all this magic keyword val loss that confuses everyone). 2) it allows new behavior like changing it dynamically during training which has been requested as well , 3) it enables these callbacks to be used in training OR eval loop instead of just eval.

whereas with the current callbacks you are stuck with what you put in at the start, and to modify what to early step or checkpoint on you have to init the callbacks, modify the keyword and remember to have those magic words in the return dict.

@justusschock
Copy link
Member

justusschock commented Jul 17, 2020

@williamFalcon I see what you mean, but I'd still go with the callback. I think, we should just pass the structured result to the callback and then do it as we currently do.

I think it's really important to allow this option, since this way it is much easier to change behaviour

@williamFalcon
Copy link
Contributor Author

@justusschock that’s what i was thinking. the result would feed directly to the callback, so that you still maintain control via the callback.

but just to clarify, you like yhe option of specifying the particular value to condition on in the result?

i guess i may have not explained correctly. The callbacks are still there and should be used. But the way you condition what to use for that particular callback is specified in the structured result...
So, it’s as if the callback didn’t have a “monitor” arg but instead you did it through the result. The reason is that otherwise it’s a a pain to decide what to use for early stopping or checkpointing, and you have no flexibility. Also everyone is confused by “val_loss” and think it’s a reserved word or something

@justusschock
Copy link
Member

Then I propose the following: Let the user use both ways. Default the argument here to None. If the argument here is None, you use the one provided by the callback and else you use the one here.

I really want to specify what I monitor from my callback, since I want it to be decoupled from my model entirely.

@williamFalcon
Copy link
Contributor Author

perfect. i like that a lot. i’ll make them none then. in my personal research, each model uses different things to checkpoint or callback, so i personally prefer to couple it haha. but this approach enables both :)

Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some typos

pytorch_lightning/core/hooks.py Outdated Show resolved Hide resolved
pytorch_lightning/core/hooks.py Outdated Show resolved Hide resolved
pytorch_lightning/core/hooks.py Outdated Show resolved Hide resolved
pytorch_lightning/core/step_result.py Outdated Show resolved Hide resolved
pytorch_lightning/overrides/data_parallel.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/callback_hook.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/callback_hook.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/callback_hook.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team July 20, 2020 21:00
@williamFalcon williamFalcon changed the title Structured results (train loop only. val loop separate PR) (PR 1/4) Structured results (train loop only. val loop separate PR) (PR 2/5) Jul 20, 2020
@williamFalcon williamFalcon merged commit 6d10ac2 into master Jul 20, 2020
@williamFalcon williamFalcon deleted the st branch July 22, 2020 17:54
@sooheon
Copy link

sooheon commented Aug 5, 2020

To be clear, the checkpoint_on, early_stop_on will feed the values directly to the appropriate callbacks, so if it is set the user can completely ignore the monitor arg in these callbacks?

@williamFalcon
Copy link
Contributor Author

exactly!
actually mind adding a PR that explains this in the docs?

@sooheon
Copy link

sooheon commented Aug 5, 2020

Sure. Can you explain the function of the minimize arg as well before I do so?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants