Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation during training: time incoherence #1010

Closed
ghost opened this issue May 4, 2022 · 6 comments · Fixed by #1012
Closed

Validation during training: time incoherence #1010

ghost opened this issue May 4, 2022 · 6 comments · Fixed by #1012
Assignees
Labels
bug Something isn't working Training Related to the Training module

Comments

@ghost
Copy link

ghost commented May 4, 2022

Hi, everyone

I'm trying to run simultaneously the training and the validation of a CIL algorithm with eval_every = 1 to get the accuracy and the loss for each epoch in the test set. This is the code is use. Note that I set num_workers = 4 in the train call.

esp_plugin = EarlyStoppingPlugin(patience = 2, val_stream_name = 'test_stream', metric_name = "Top1_Acc_Exp")

cl_strategy = LwF(
    model, Adam(model.parameters(), lr = 0.001),
    CrossEntropyLoss(), train_mb_size = 256, train_epochs = 10, eval_mb_size = 256, plugins = [esp_plugin], 
    evaluator = eval_plugin, alpha = [0, 1], temperature = 1, eval_every = 1, device = device)
    
for experience in generic_scenario.train_stream:
    n_exp = experience.current_experience
    print("Start of experience: ", n_exp)
    print("Current Classes: ", experience.classes_in_this_experience)
    cl_strategy.train(experience, eval_streams = [generic_scenario.test_stream[0:n_exp+1]], num_workers = 4)
    print('Computed accuracy on the whole test set')

This is the problem I got. While the training iteration only lasts for 21'', the evaluation lasts for almost 3' when the size of the evaluation stream is 5x times shorter. I tried in both the beta version and the latest version but the same error was found for both.

0

@ghost ghost added the bug Something isn't working label May 4, 2022
@ghost ghost changed the title Validation during training time incoherence Validation during training: time incoherence May 4, 2022
@ggraffieti
Copy link
Member

The bug is in the PeriodicEval class (avalanche/training/templates/base_sgd.py file).
During the periodic evaluation is called the eval method on the strategy without passing the number of workers (default to 1).

def _peval(self, strategy):
    for el in strategy._eval_streams:
        strategy.eval(el)  # <--- here 

It should be an easy fix, we'll notify you when is incorporated in the main branch.
Just to be sure, if you edit your code as:

for experience in generic_scenario.train_stream:
    n_exp = experience.current_experience
    print("Start of experience: ", n_exp)
    print("Current Classes: ", experience.classes_in_this_experience)
    cl_strategy.train(experience, eval_streams = [], num_workers = 4)
    cl_strategy.eval(generic_scenario.test_stream[0:n_exp+1], num_workers = 4)
    print('Computed accuracy on the whole test set')

should work with the same number of workers and the same speed between train and eval. Obviously is not a fix for your problem since the metrics are calculated only at the end of the experience and not after every epoch.

@ggraffieti ggraffieti self-assigned this May 4, 2022
@ghost
Copy link
Author

ghost commented May 4, 2022

Thank you again, @ggraffieti

Yeah, I tried previously this code you proposed with and without passing the num_workers argument to the eval method. What I got is a very fast evaluation computation for num_workers = 4 and a more slower and comparable to the times presented in the screenshot without the workers.

About the bug, it is possible it also raises in _maybe_peval? I'm not a coding expert, but I passed the **kwargs to this methods and it started working for me. Surely/maybe it is not the most efficient way to code it, but it is already working and saving me lot of time.

def _peval(self, strategy, **kwargs):
    for el in strategy._eval_streams:
        strategy.eval(el, **kwargs)

def _maybe_peval(self, strategy, counter, **kwargs): # <-- ¿also here?
    if self.eval_every > 0 and counter % self.eval_every == 0:
        self._peval(strategy, **kwargs)

@ggraffieti
Copy link
Member

You are right @PabloMese, good catch!
In fact '_maybe_peval' is used to eval the model after each epoch, not '_peval', which is used only at the end of the experience.
If you already fixed that in your code you can do a PR to avalanche and become an official contributor 😃

@AntonioCarta AntonioCarta added the Training Related to the Training module label May 5, 2022
@ghost
Copy link
Author

ghost commented May 5, 2022

I will give it a try, @ggraffieti.

So we can now close this issue. Thanks for the help.

@ghost ghost closed this as completed May 5, 2022
@ggraffieti
Copy link
Member

Perfect @PabloMese!
I'll reopen just to keep track of the bug, we'll close the issue when the fix is included in the official code 👍

@ggraffieti
Copy link
Member

@PabloMese solved by the linked PR.
When it will be accepted you can reinstall the "nightly" version of the library and the bug will be gone 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Training Related to the Training module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants