Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] Separate Test and Fit #2107

Closed
wants to merge 0 commits into from
Closed

Conversation

tullie
Copy link
Contributor

@tullie tullie commented Jun 7, 2020

What does this PR do?

Address some of the discussion from issue #1195. On the public API it changes:

trainer = Trainer(args)
trainer.fit(model)
trainer.test()

To:

Trainer(args).fit(model)
Evaluator(args).test(model)

To achieve this - internally there's a large refactor that begins to decouple training and testing code. Notably:

  • Adds a shared LightingConfig class that's used for Trainer and Evaluator config options.
  • Adds shared mixins such as (SlurmMixin, LoopRunnerMixin and InitializationMixin)
  • Refactors fit and run_pretrain_loop functions to use callables
  • Moved shared functions (num_gpus and data_parallel) to initialization
  • Moves PatchDataLoader to data_loader mixin
  • Refactors CLI code to work with LightningConfig and be slightly clearer

Above I say "begins to decouple" because there's still so much internal refactoring to do. I really did the least possible amount to be able to achieve this API in anticipation for a 1.0 release. Ideally, we'd be able to get rid of concepts like an OptimizerMixin from the Evaluator, however, it's all too tangled up atm. In the future i'd propose replacing the mixin architecture with something that encourages encapsulation between the components.

In regards to backwards compatibility, i've aimed to fully support the old API until version 0.9.0.

TODO in this PR

  • Add changelog
  • Test on TPU and fix this line torch_xla.core.xla_model.rendezvous("pl.Tester.run_pretrain_routine")
  • Update deprecation since version numbers

@tullie tullie requested a review from williamFalcon June 7, 2020 20:58
@mergify mergify bot requested a review from a team June 7, 2020 20:58
@pep8speaks
Copy link

pep8speaks commented Jun 7, 2020

Hello @tullie! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-07 15:44:31 UTC

@tullie tullie requested review from neggert and removed request for a team June 7, 2020 20:58
@mergify mergify bot requested a review from a team June 7, 2020 20:58
@Borda Borda changed the title Separate Test and Fit [wip] Separate Test and Fit Jun 7, 2020
@Borda Borda added feature Is an improvement or enhancement Important labels Jun 7, 2020
@Borda
Copy link
Member

Borda commented Jun 7, 2020

I like the idea of splitting, but not sure about advantage of LightingConfig
does it also mean the the .test(...) is removed from trainer?

btw, what about Trainer & Examinator? 🐰
probably it would be cleaner to make it after #2073

@mergify mergify bot requested review from a team June 7, 2020 21:16


@dataclass
class LightningConfig:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may have a disadvantage that all the arguments won't be listed in Trainer docs page and so a user needs to search for it also it drops IDE support of whispering possible arguments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having a lightning config. I think we could maybe change it that the trainer still has all the arguments, internally wrap it to a config holding all the internal state variables as well as configuration and just make the trainer have properties for the most relevant attributes.

This would also have the advantage, that you could possibly save and resume the whole trainer state at every moment

Copy link
Contributor

@williamFalcon williamFalcon Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i also like the config... much easier to keep track of args and share among classes.
i do prefer allowing the init to be:

Trainer(arg1, arg2, etc...)

Instead of

Trainer(config)

For the reasons @Borda and @justusschock mentioned (especially IDE completion, docs, etc...)

Internally we should wrap it in the config object. Now, in theory people could use the config directly if they wanted to:

Trainer(**config)

Comment on lines 48 to 53
if config.nb_gpu_nodes is not None:
rank_zero_warn(
"Argument `nb_gpu_nodes` has renamed to `num_nodes` since v0.5.0"
" and this method will be removed in v0.8.0",
DeprecationWarning,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have now the LightningConfig this warning and deprecation shall be called there...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, will add this in the next pass

@mergify mergify bot requested a review from a team June 7, 2020 21:21
@tullie
Copy link
Contributor Author

tullie commented Jun 7, 2020

Instead of using kwargs we could have a positional config argument of type LightingConfig (but this wouldn't be backwards compatible) or we could just keep the arg documentation in Trainer. Both of these options would handle the IDE hinting. What do you think?

.test(...) will be removed after deprecation phase, yes!

@tullie
Copy link
Contributor Author

tullie commented Jun 8, 2020

Anyone know how I can fix the DocTestFailure? How can I reproduce that test locally?

@awaelchli
Copy link
Contributor

yes it means The trainer args have changed (either the ordering, defaults or some got added or removed)

pytest.param({'logger': False}, {'logger': True}),
pytest.param({'logger': False}, {'checkpoint_callback': True}),
])
def test_init_from_argparse_args(cli_args, extra_args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this test get removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I moved the init_from_argparse function completely so I don't think there's anything to test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did it get moved? sry but I don't get it.
These tests were part of a bugfix and are essential to make sure it works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh sorry, you're totally right, this test shouldn't be removed. I was getting it mixed up with the one I removed above. Thanks for catching this!

@mergify mergify bot requested a review from a team June 8, 2020 01:34
@reactivetype
Copy link

Please consider this related issue #1694
It was closed but I think the issue is not yet solved

Copy link
Member

@justusschock justusschock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some questions...

I'm not sure, but I think I'd delay this after 0.8 and make a whole refactoring for 0.9 that also includes tuner and making the training-loop stuff a bit more modular. IMO we should try to have breaking chances as infrequently as possible and rather have one huge break than multiple small ones.



@dataclass
class LightningConfig:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having a lightning config. I think we could maybe change it that the trainer still has all the arguments, internally wrap it to a config holding all the internal state variables as well as configuration and just make the trainer have properties for the most relevant attributes.

This would also have the advantage, that you could possibly save and resume the whole trainer state at every moment

@@ -310,3 +310,25 @@ def determine_data_use_amount(self, train_percent_check: float, val_percent_chec
self.train_percent_check = overfit_pct
self.val_percent_check = overfit_pct
self.test_percent_check = overfit_pct


class PatchDataLoader(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Is this needed now but wasn't before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just moved from the Trainer file

from pytorch_lightning.callbacks import EarlyStopping, ModelCheckpoint
from pytorch_lightning.loggers import TensorBoardLogger
from tests.base import EvalModelTemplate


class TestCallback(Callback):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we split callbacks as well or will we have a single class that has to be attached to both, Trainer and Evaluator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment it's a single class that has to be attached to both. Having evaluator specific callbacks in the future might make sense though.

@mergify mergify bot requested a review from a team June 8, 2020 06:19
@Borda
Copy link
Member

Borda commented Jun 8, 2020

Anyone know how I can fix the DocTestFailure? How can I reproduce that test locally?

@tullie For local tests pls check the CircleCI config, there are the exact commands for pytest and checking/building the docs

cd docs; make clean; make html --debug --jobs 2 SPHINXOPTS="-W"
cd docs; make doctest; make coverage

@Borda
Copy link
Member

Borda commented Jun 8, 2020

I'm not sure, but I think I'd delay this after 0.8 and make a whole refactoring for 0.9

most likely there won't be 0.9, on the other hand it would make sense do 0.8 this week, then this split together with HyperTuner the week after...

IMO we should try to have breaking chances as infrequently as possible and rather have one huge break than multiple small ones.

This is a good point :]

I like the idea of having a lightning config. I think we could maybe change it that the trainer still has all the arguments, internally wrap it to a config holding all the internal state variables as well as configuration and just make the trainer have properties for the most relevant attributes.

you mean something similar as we wrap hyperparameters in the model to a single workspace?

@justusschock
Copy link
Member

you mean something similar as we wrap hyperparameters in the model to a single workspace?

yes, exactly :)

@mergify
Copy link
Contributor

mergify bot commented Jun 8, 2020

This pull request is now in conflict... :(

@williamFalcon
Copy link
Contributor

Instead of using kwargs we could have a positional config argument of type LightingConfig (but this wouldn't be backwards compatible) or we could just keep the arg documentation in Trainer. Both of these options would handle the IDE hinting. What do you think?

.test(...) will be removed after deprecation phase, yes!

what problem is config trying to solve? If it's not duplicating args between Trainer and Evaluator then we should just use inheritance and have Trainer and Evaluator inherit from a superclass that lists all config stuff.

But otherwise, i think it's important to directly have all the typehinting and directly using args

Trainer(arg1=1...)

@Borda Borda added this to the 0.9.0 milestone Jun 8, 2020
@tullie
Copy link
Contributor Author

tullie commented Jun 9, 2020

@reactivetype
reactivetype

Please consider this related issue #1694
It was closed but I think the issue is not yet solved

Yeah we should reopen that issue. It's a good feature request. That should go in a separate PR then this though as it's not specifically related to the class separation.

@justusschock

Just some questions...

I'm not sure, but I think I'd delay this after 0.8 and make a whole refactoring for 0.9 that also includes tuner and making the training-loop stuff a bit more modular. IMO we should try to have breaking chances as infrequently as possible and rather have one huge break than multiple small ones.

Yeah we can delay this to 0.9. I was planning to try and incorporate some follow up refactors in following PRs so putting them all in 0.9 makes sense.

I like the idea of having a lightning config. I think we could maybe change it that the trainer still has all the arguments, internally wrap it to a config holding all the internal state variables as well as configuration and just make the trainer have properties for the most relevant attributes.

How would this address the duplicate args in evaluator and trainer? Would you expect both to just list out their args even though they're very similar (and the exact same at the moment).

@williamFalcon

what problem is config trying to solve? If it's not duplicating args between Trainer and Evaluator then we should just use inheritance and have Trainer and Evaluator inherit from a superclass that lists all config stuff.

But otherwise, i think it's important to directly have all the typehinting and directly using args

The way this PR is framed it isn't duplicating args because evaluator is still quite coupled with the trainer. I'm hoping to keep refactoring and separate them more though. This would potentially include separating their configs. E.g. evaluator doesn't need a gradient_accumulator arg. How does the IDE typehinting work exactly? If the superclass has typed args will it show them when hovering over the subclass or something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants