Transfer learning phases #2006

lgvaz · 2020-05-29T22:07:16Z

🚀 Feature

When doing transfer learning we need to switch between phases.

Normally, the first phase is to freeze all but the head of the model and train only that.

After a predefined amount of epochs, we unfreeze the rest of our model (or a part of it) and start training again (possibly with the help of differential learning rates, described in #2005). We can repeat this phase as many times as we like.

We should implement a class that handles all of that for us, this includes:

Unfreeze part of our model
Reset and change the lr_scheduler parameters between phases
If LearningRateLogger is being used, register the new lr_scheduler

#2005 Will take care of the parameter groups
This will take care of what I call "phase switches"

Proposals

There are some ways of achieving this:

Logic inside `on_epoch_start`

def on_epoch_start(self):
    if self.current_epoch == 0:
        self.freeze()
        self.trainer.lr_schedulers = ... # Define new scheduler
        
    if self.current_epoch == N_FREEZE_EPOCHS:
        self.unfreeze() # Or partially unfreeze
        self.trainer.lr_schedulers = ... # Define new scheduler

We can keep adding as many milestones as we want this way, but it's important to note that they all have to be define beforehand.

Multiple calls to `Trainer.fit`

model.freeze()
trainer.fit_one_cycle(model, n_epochs=2, lr=1e-3, pct_start=0.9)
model.unfreeze()
trainer.fit_one_cycle(mode, n_epochs=5, lr=slice(5e-6, 5e-4), pct_start=0.2)

This is exactly the flow on fastai, this way of training model is excellent for iterative training, like on a notebook or a REPL.

fit_one_cycle assumes that we are using the OneCycleLR scheduler, assumes that each call is a continuation of the last, and assumes we want to reset our schedule

When we pass a slice to lr we are asking for a interpolation of values between the trainable layer groups

Implement a new scheduler (suggested by @williamFalcon)

The scheduler receives a list of dicts, each dict will specify the duration of the phase and it's configuration (what layers to freeze, what lrs to use, ...)

scheduler = FineTuneScheduler([
   {'params': [nn.Sequential(self.c_d1, self.c_d1_bn), self.c_d2], 'action': 'freeze', 'epoch': 0},
   {'params': [self.c_d2], 'action': 'unfreeze', 'epoch': 2},
])

Then we can just pass the scheduler to the Trainer.

Notes

In both cases, the flow should be the same for all standard areas (vision, nlp, time-series,...).

The only things we assume is:

You want to train on model in multiple phases
The phases are a continuation of each other

The text was updated successfully, but these errors were encountered:

lgvaz · 2020-05-30T14:47:27Z

I personally like the approach of calling Trainer.fit (or some variation) multiple times more.

It allows me to have more control on how to train my model. Usually transfer learning happens on small datasets, so it's possible for the user to train some epochs, see what happens, and only then decide if it's time to unfreeze some layers or run some more epochs on the current configuration.

lgvaz · 2020-05-30T17:30:52Z

Added a new proposal to OP, the scheduler interface suggested by @williamFalcon

I think the main benefit of this approach is that it's easily reproducible, because we are using a list of dicts (configs), I think we can even store the scheduler into as a config file in the future.

lgvaz · 2020-05-30T17:43:16Z

Another option with the scheduler, would be to pass a function to it instead of predefined actions, it would look like something like this:

def phase1(trainer, model):
    model.freeze()
    sched = OneCycleLR(...)
    trainer.new_schedule(sched)

def phase2(trainer, model):
    model.unfreeze()
    sched = OneCycleLR(...) # Differential LRs can be introduced here
    trainer.new_schedule(sched)

sched = FineTuneScheduler([
    {'func': phase1, 'epoch': 0},
    {'func': phase2, 'epoch': 5},
])

This gives the user full control on what happens in these phases

If you think about it, this is not even a specific FineTunerScheduler, it's more like a LambdaScheduler, you can inject any functionality you want with it, very powerful.

We can then implement helper functions to make the definition of differential learning rates, reseting schedulers easier. But it would be up to the user to construct what we wants =)

lgvaz · 2020-05-30T17:47:42Z

One thing I don't currently like about it though, is that when creating a new scheduler I also need to know the duration of the phase. Maybe we can change is signature to:

def phase(trainer, model, n_epochs)

lgvaz · 2020-05-30T17:51:02Z

And then, as @williamFalcon suggested again, we can implement a scheduler that is really specific to the standard transfer learning case:

class FineTuneScheduler(Scheduler):
  def __init__(self, pretrained, head, head_unfreeze_epoch):
       ...

# unfreeze head after 1 epoch
sched = FineTuneScheduler(nn.Sequential(self.c_d1, self.c_d1_bn), self.c_d2, 1)

# unfreeze head after 10 epoch
sched = FineTuneScheduler(nn.Sequential(self.c_d1, self.c_d1_bn), self.c_d2, 10)

This can be easily built on top of LambdaScheduler

Borda · 2020-06-03T15:40:36Z

I would go the scheduler way with duct config as it can be simply stored and even without load/run you can see what you did in past, kind or history notes

Borda · 2020-06-03T15:49:15Z

@PyTorchLightning/core-contributors any other thoughts?

reactivetype · 2020-06-05T06:11:34Z

When restoring a checkpoint for finetuning a model, users still need a way to reset the current_epoch and global_step to 0.

Do we still need a GH issue to handle this aside from params_group and differentiable learning rate features?

A hack to this was described by @lgvaz

class MyTrainer(Trainer):
    def restore_weights(self, model: LightningModule):
        res = super().restore_weights(model)
        self.reset_lr_schedulers()
        return res
    def reset_lr_schedulers(self):
        for sched in self.lr_schedulers:
            sched['scheduler'].last_epoch = 0

Is there a better way? If we pass both resume_from_checkpoint and lr_schedulers params to the Trainer, will the new lr_schedulers override the ones saved from the saved checkpoint’s training state along with the scheduler's last_epoch?

stale · 2020-08-04T07:03:39Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tchaton · 2021-03-09T12:33:27Z

Dear @lgvaz,

This logic can easily be built on top of BaseFinetuning callback.

def phase1(trainer, model):
    model.freeze()
    sched = OneCycleLR(...)
    trainer.new_schedule(sched)

def phase2(trainer, model):
    model.unfreeze()
    sched = OneCycleLR(...) # Differential LRs can be introduced here
    trainer.new_schedule(sched)


class FinetuneScheduler(BaseFinetuning):

    def __init__(self, phases):

        self.phases = phases

   @property
    def max_epochs(self):
           # return total number of epochs to run from phases.

    def freeze_before_training(self, pl_module: pl.LightningModule):
        self.freeze(modules=pl_module, train_bn=self.train_bn)

    def finetune_function(self, pl_module: pl.LightningModule, epoch: int, optimizer: Optimizer, opt_idx: int):
        # Logic to extract the phase and apply it
        ...

        ...


cb = FinetuneScheduler([
    {'func': phase1, 'epoch': 0},
    {'func': phase2, 'epoch': 5},
])

Trainer(callback=cb, max_epochs= cb.max_epochs)

https://pytorch-lightning.readthedocs.io/en/stable/extensions/generated/pytorch_lightning.callbacks.BaseFinetuning.html?highlight=BaseFinetuning

If you do implement a nice Finetuning Callback, please make a PR so the community can try it out :)

Best,
T.C

lgvaz · 2021-03-27T13:27:04Z

Hi @tchaton thanks for the update! Unfortunately I don't have the time to try this out right now =/

Should we leave this issue open or should we close it?

carmocca · 2022-02-01T16:47:38Z

We are not looking to add any more callbacks to core that are too opinionated, research-y, or just not applicable to most users. We suggest developing this callback in your own repository.

lgvaz added feature Is an improvement or enhancement help wanted Open to be worked on labels May 29, 2020

This was referenced May 30, 2020

[WIP] implements ParameterGroupsModuleMixin #2007

Closed

Differential learning rates and parameter groups #2005

Closed

stale bot added the won't fix This will not be worked on label Aug 4, 2020

Borda added design Includes a design discussion discussion In a discussion stage Important and removed won't fix This will not be worked on labels Aug 4, 2020

oke-aditya mentioned this issue Sep 11, 2020

Re usable Components and CNN Trainer Lightning-Universe/lightning-bolts#203

Closed

edenlightning modified the milestones: 0.9.x, 1.1 Sep 17, 2020

edenlightning modified the milestones: 1.1, 1.2 Oct 19, 2020

imirzadeh mentioned this issue Dec 31, 2020

Continual/Multitask/Transfer Learning in PyTorch Lightning #5314

Closed

edenlightning modified the milestones: 1.2, 1.3 Feb 8, 2021

tchaton added the waiting on author Waiting on user action, correction, or update label Mar 9, 2021

tchaton added the priority: 1 Medium priority task label Mar 9, 2021

edenlightning modified the milestones: v1.3, v1.4 Apr 27, 2021

edenlightning removed Important priority: 1 Medium priority task waiting on author Waiting on user action, correction, or update labels May 9, 2021

edenlightning modified the milestones: v1.4, v1.5 Jun 30, 2021

This was referenced Oct 27, 2021

A Multi-phase, Scheduled Finetuning Callback #10197

Closed

Provide intra-fit() ckpt_path access to enable additional finetuning modalities #10198

Closed

awaelchli modified the milestones: v1.5, v1.6 Nov 4, 2021

carmocca closed this as completed Feb 1, 2022

ZeguanXiao mentioned this issue May 6, 2022

How to change/modify learning scheduler between phases? speediedan/finetuning-scheduler#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transfer learning phases #2006

Transfer learning phases #2006

lgvaz commented May 29, 2020 •

edited

Loading

lgvaz commented May 30, 2020

lgvaz commented May 30, 2020 •

edited

Loading

lgvaz commented May 30, 2020 •

edited

Loading

lgvaz commented May 30, 2020

lgvaz commented May 30, 2020

Borda commented Jun 3, 2020

Borda commented Jun 3, 2020

reactivetype commented Jun 5, 2020 •

edited

Loading

stale bot commented Aug 4, 2020

tchaton commented Mar 9, 2021 •

edited

Loading

lgvaz commented Mar 27, 2021

carmocca commented Feb 1, 2022

Transfer learning phases #2006

Transfer learning phases #2006

Comments

lgvaz commented May 29, 2020 • edited Loading

🚀 Feature

Proposals

Logic inside on_epoch_start

Multiple calls to Trainer.fit

Implement a new scheduler (suggested by @williamFalcon)

Notes

lgvaz commented May 30, 2020

lgvaz commented May 30, 2020 • edited Loading

lgvaz commented May 30, 2020 • edited Loading

lgvaz commented May 30, 2020

lgvaz commented May 30, 2020

Borda commented Jun 3, 2020

Borda commented Jun 3, 2020

reactivetype commented Jun 5, 2020 • edited Loading

stale bot commented Aug 4, 2020

tchaton commented Mar 9, 2021 • edited Loading

lgvaz commented Mar 27, 2021

carmocca commented Feb 1, 2022

lgvaz commented May 29, 2020 •

edited

Loading

Logic inside `on_epoch_start`

Multiple calls to `Trainer.fit`

lgvaz commented May 30, 2020 •

edited

Loading

lgvaz commented May 30, 2020 •

edited

Loading

reactivetype commented Jun 5, 2020 •

edited

Loading

tchaton commented Mar 9, 2021 •

edited

Loading