[wip] Separate Test and Fit #2107

tullie · 2020-06-07T20:58:02Z

What does this PR do?

Address some of the discussion from issue #1195. On the public API it changes:

trainer = Trainer(args)
trainer.fit(model)
trainer.test()

To:

Trainer(args).fit(model)
Evaluator(args).test(model)

To achieve this - internally there's a large refactor that begins to decouple training and testing code. Notably:

Adds a shared LightingConfig class that's used for Trainer and Evaluator config options.
Adds shared mixins such as (SlurmMixin, LoopRunnerMixin and InitializationMixin)
Refactors fit and run_pretrain_loop functions to use callables
Moved shared functions (num_gpus and data_parallel) to initialization
Moves PatchDataLoader to data_loader mixin
Refactors CLI code to work with LightningConfig and be slightly clearer

Above I say "begins to decouple" because there's still so much internal refactoring to do. I really did the least possible amount to be able to achieve this API in anticipation for a 1.0 release. Ideally, we'd be able to get rid of concepts like an OptimizerMixin from the Evaluator, however, it's all too tangled up atm. In the future i'd propose replacing the mixin architecture with something that encourages encapsulation between the components.

In regards to backwards compatibility, i've aimed to fully support the old API until version 0.9.0.

TODO in this PR

Add changelog
Test on TPU and fix this line torch_xla.core.xla_model.rendezvous("pl.Tester.run_pretrain_routine")
Update deprecation since version numbers

pep8speaks · 2020-06-07T20:58:20Z

Hello @tullie! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-07 15:44:31 UTC

Borda · 2020-06-07T21:13:10Z

I like the idea of splitting, but not sure about advantage of LightingConfig
does it also mean the the .test(...) is removed from trainer?

btw, what about Trainer & Examinator? 🐰
probably it would be cleaner to make it after #2073

Borda · 2020-06-07T21:18:45Z

pytorch_lightning/trainer/lightning_config.py

+
+
+@dataclass
+class LightningConfig:


this may have a disadvantage that all the arguments won't be listed in Trainer docs page and so a user needs to search for it also it drops IDE support of whispering possible arguments

I like the idea of having a lightning config. I think we could maybe change it that the trainer still has all the arguments, internally wrap it to a config holding all the internal state variables as well as configuration and just make the trainer have properties for the most relevant attributes.

This would also have the advantage, that you could possibly save and resume the whole trainer state at every moment

i also like the config... much easier to keep track of args and share among classes.
i do prefer allowing the init to be:

Trainer(arg1, arg2, etc...)

Instead of

Trainer(config)

For the reasons @Borda and @justusschock mentioned (especially IDE completion, docs, etc...)

Internally we should wrap it in the config object. Now, in theory people could use the config directly if they wanted to:

Trainer(**config)

Borda · 2020-06-07T21:19:14Z

pytorch_lightning/trainer/initialization_mixin.py

+        if config.nb_gpu_nodes is not None:
+            rank_zero_warn(
+                "Argument `nb_gpu_nodes` has renamed to `num_nodes` since v0.5.0"
+                " and this method will be removed in v0.8.0",
+                DeprecationWarning,
+            )


if we have now the LightningConfig this warning and deprecation shall be called there...

Good call, will add this in the next pass

tullie · 2020-06-07T21:59:06Z

Instead of using kwargs we could have a positional config argument of type LightingConfig (but this wouldn't be backwards compatible) or we could just keep the arg documentation in Trainer. Both of these options would handle the IDE hinting. What do you think?

.test(...) will be removed after deprecation phase, yes!

tullie · 2020-06-08T01:13:59Z

Anyone know how I can fix the DocTestFailure? How can I reproduce that test locally?

awaelchli · 2020-06-08T01:26:59Z

yes it means The trainer args have changed (either the ordering, defaults or some got added or removed)

awaelchli · 2020-06-08T01:33:40Z

tests/trainer/test_trainer_cli.py

-    pytest.param({'logger': False}, {'logger': True}),
-    pytest.param({'logger': False}, {'checkpoint_callback': True}),
-])
-def test_init_from_argparse_args(cli_args, extra_args):


will this test get removed?

Yeah I moved the init_from_argparse function completely so I don't think there's anything to test.

where did it get moved? sry but I don't get it.
These tests were part of a bugfix and are essential to make sure it works.

Ahh sorry, you're totally right, this test shouldn't be removed. I was getting it mixed up with the one I removed above. Thanks for catching this!

reactivetype · 2020-06-08T04:43:37Z

Please consider this related issue #1694
It was closed but I think the issue is not yet solved

justusschock

Just some questions...

I'm not sure, but I think I'd delay this after 0.8 and make a whole refactoring for 0.9 that also includes tuner and making the training-loop stuff a bit more modular. IMO we should try to have breaking chances as infrequently as possible and rather have one huge break than multiple small ones.

justusschock · 2020-06-08T06:11:39Z

pytorch_lightning/trainer/lightning_config.py

+
+
+@dataclass
+class LightningConfig:


I like the idea of having a lightning config. I think we could maybe change it that the trainer still has all the arguments, internally wrap it to a config holding all the internal state variables as well as configuration and just make the trainer have properties for the most relevant attributes.

This would also have the advantage, that you could possibly save and resume the whole trainer state at every moment

justusschock · 2020-06-08T06:13:46Z

pytorch_lightning/trainer/data_loading.py

@@ -310,3 +310,25 @@ def determine_data_use_amount(self, train_percent_check: float, val_percent_chec
            self.train_percent_check = overfit_pct
            self.val_percent_check = overfit_pct
            self.test_percent_check = overfit_pct
+
+
+class PatchDataLoader(object):


Why Is this needed now but wasn't before?

It's just moved from the Trainer file

justusschock · 2020-06-08T06:16:36Z

tests/callbacks/test_callbacks.py

 from pytorch_lightning.callbacks import EarlyStopping, ModelCheckpoint
 from pytorch_lightning.loggers import TensorBoardLogger
 from tests.base import EvalModelTemplate


+class TestCallback(Callback):


Will we split callbacks as well or will we have a single class that has to be attached to both, Trainer and Evaluator?

At the moment it's a single class that has to be attached to both. Having evaluator specific callbacks in the future might make sense though.

Borda · 2020-06-08T07:20:36Z

Anyone know how I can fix the DocTestFailure? How can I reproduce that test locally?

@tullie For local tests pls check the CircleCI config, there are the exact commands for pytest and checking/building the docs

cd docs; make clean; make html --debug --jobs 2 SPHINXOPTS="-W"
cd docs; make doctest; make coverage

Borda · 2020-06-08T07:28:42Z

I'm not sure, but I think I'd delay this after 0.8 and make a whole refactoring for 0.9

most likely there won't be 0.9, on the other hand it would make sense do 0.8 this week, then this split together with HyperTuner the week after...

IMO we should try to have breaking chances as infrequently as possible and rather have one huge break than multiple small ones.

This is a good point :]

I like the idea of having a lightning config. I think we could maybe change it that the trainer still has all the arguments, internally wrap it to a config holding all the internal state variables as well as configuration and just make the trainer have properties for the most relevant attributes.

you mean something similar as we wrap hyperparameters in the model to a single workspace?

justusschock · 2020-06-08T10:56:48Z

you mean something similar as we wrap hyperparameters in the model to a single workspace?

yes, exactly :)

mergify · 2020-06-08T11:21:02Z

This pull request is now in conflict... :(

williamFalcon · 2020-06-08T11:38:11Z

Instead of using kwargs we could have a positional config argument of type LightingConfig (but this wouldn't be backwards compatible) or we could just keep the arg documentation in Trainer. Both of these options would handle the IDE hinting. What do you think?

.test(...) will be removed after deprecation phase, yes!

what problem is config trying to solve? If it's not duplicating args between Trainer and Evaluator then we should just use inheritance and have Trainer and Evaluator inherit from a superclass that lists all config stuff.

But otherwise, i think it's important to directly have all the typehinting and directly using args

Trainer(arg1=1...)

tullie · 2020-06-09T00:08:56Z

@reactivetype
reactivetype

Please consider this related issue #1694
It was closed but I think the issue is not yet solved

Yeah we should reopen that issue. It's a good feature request. That should go in a separate PR then this though as it's not specifically related to the class separation.

@justusschock

Just some questions...

I'm not sure, but I think I'd delay this after 0.8 and make a whole refactoring for 0.9 that also includes tuner and making the training-loop stuff a bit more modular. IMO we should try to have breaking chances as infrequently as possible and rather have one huge break than multiple small ones.

Yeah we can delay this to 0.9. I was planning to try and incorporate some follow up refactors in following PRs so putting them all in 0.9 makes sense.

I like the idea of having a lightning config. I think we could maybe change it that the trainer still has all the arguments, internally wrap it to a config holding all the internal state variables as well as configuration and just make the trainer have properties for the most relevant attributes.

How would this address the duplicate args in evaluator and trainer? Would you expect both to just list out their args even though they're very similar (and the exact same at the moment).

@williamFalcon

what problem is config trying to solve? If it's not duplicating args between Trainer and Evaluator then we should just use inheritance and have Trainer and Evaluator inherit from a superclass that lists all config stuff.

But otherwise, i think it's important to directly have all the typehinting and directly using args

The way this PR is framed it isn't duplicating args because evaluator is still quite coupled with the trainer. I'm hoping to keep refactoring and separate them more though. This would potentially include separating their configs. E.g. evaluator doesn't need a gradient_accumulator arg. How does the IDE typehinting work exactly? If the superclass has typed args will it show them when hovering over the subclass or something?

tullie requested a review from williamFalcon June 7, 2020 20:58

mergify bot requested a review from a team June 7, 2020 20:58

tullie requested review from neggert and removed request for a team June 7, 2020 20:58

mergify bot requested a review from a team June 7, 2020 20:58

Borda changed the title ~~Separate Test and Fit~~ [wip] Separate Test and Fit Jun 7, 2020

Borda added feature Is an improvement or enhancement Important labels Jun 7, 2020

mergify bot requested review from a team June 7, 2020 21:16

Borda reviewed Jun 7, 2020

View reviewed changes

mergify bot requested a review from a team June 7, 2020 21:21

awaelchli reviewed Jun 8, 2020

View reviewed changes

mergify bot requested a review from a team June 8, 2020 01:34

justusschock reviewed Jun 8, 2020

View reviewed changes

mergify bot requested a review from a team June 8, 2020 06:19

Borda added this to the 0.9.0 milestone Jun 8, 2020

tullie mentioned this pull request Jun 16, 2020

Let's add a suggested_num_workers() method? #2196

Closed

awaelchli mentioned this pull request Jun 23, 2020

Question about trainer.test() #2296

Closed

awaelchli mentioned this pull request Jul 3, 2020

How can I perform only validation without training #2481

Closed

tullie force-pushed the master branch from bac941e to 22d692a Compare August 7, 2020 15:44

tullie closed this Aug 7, 2020

tullie force-pushed the master branch from 22d692a to b39f479 Compare August 7, 2020 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] Separate Test and Fit #2107

[wip] Separate Test and Fit #2107

tullie commented Jun 7, 2020 •

edited

Loading

pep8speaks commented Jun 7, 2020 •

edited

Loading

Borda commented Jun 7, 2020

Borda Jun 7, 2020

justusschock Jun 8, 2020

williamFalcon Jun 8, 2020 •

edited

Loading

Borda Jun 7, 2020

tullie Jun 7, 2020

tullie commented Jun 7, 2020 •

edited

Loading

tullie commented Jun 8, 2020

awaelchli commented Jun 8, 2020

awaelchli Jun 8, 2020

tullie Jun 8, 2020

awaelchli Jun 9, 2020

tullie Jun 9, 2020

reactivetype commented Jun 8, 2020

justusschock left a comment

justusschock Jun 8, 2020

justusschock Jun 8, 2020

tullie Jun 8, 2020

justusschock Jun 8, 2020

tullie Jun 8, 2020

Borda commented Jun 8, 2020 •

edited

Loading

Borda commented Jun 8, 2020 •

edited

Loading

justusschock commented Jun 8, 2020

mergify bot commented Jun 8, 2020

williamFalcon commented Jun 8, 2020

tullie commented Jun 9, 2020 •

edited

Loading

[wip] Separate Test and Fit #2107

[wip] Separate Test and Fit #2107

Conversation

tullie commented Jun 7, 2020 • edited Loading

What does this PR do?

TODO in this PR

pep8speaks commented Jun 7, 2020 • edited Loading

Comment last updated at 2020-08-07 15:44:31 UTC

Borda commented Jun 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

williamFalcon Jun 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tullie commented Jun 7, 2020 • edited Loading

tullie commented Jun 8, 2020

awaelchli commented Jun 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reactivetype commented Jun 8, 2020

justusschock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Borda commented Jun 8, 2020 • edited Loading

Borda commented Jun 8, 2020 • edited Loading

justusschock commented Jun 8, 2020

mergify bot commented Jun 8, 2020

williamFalcon commented Jun 8, 2020

tullie commented Jun 9, 2020 • edited Loading

tullie commented Jun 7, 2020 •

edited

Loading

pep8speaks commented Jun 7, 2020 •

edited

Loading

williamFalcon Jun 8, 2020 •

edited

Loading

tullie commented Jun 7, 2020 •

edited

Loading

Borda commented Jun 8, 2020 •

edited

Loading

Borda commented Jun 8, 2020 •

edited

Loading

tullie commented Jun 9, 2020 •

edited

Loading