Investigate if lr_scheduler from segmentation can use PyTorch's schedulers #4438

fmassa · 2021-09-17T10:54:17Z

Back when it was initially implemented in 2019, the LR scheduler in the segmentation reference scripts couldn't be implemented with native PyTorch schedulers, so we had to resort to LambdaLR

vision/references/segmentation/train.py

Lines 136 to 138 in 9275cc6

    
           lr_scheduler = torch.optim.lr_scheduler.LambdaLR( 
        
               optimizer, 
        
               lambda x: (1 - x / (len(data_loader) * args.epochs)) ** 0.9)

It might be that this is now available in PyTorch natively, and this can be simplified.

cc @datumbox

The text was updated successfully, but these errors were encountered:

avijit9 · 2021-09-29T13:13:12Z

Is there any polynomial learning rate scheduler implemented yet? I can see this issue still open - pytorch/pytorch#2854

cc: @fmassa @datumbox

datumbox · 2021-09-29T14:23:10Z

I had a look also when Francisco raised the ticket but couldn't see anything compatible TBH.

fmassa · 2021-09-29T14:28:48Z

It might not be implemented yet. I think we should check to see if this type of scheduler has been used in more papers since then, that could justify adding it to PyTorch.

datumbox · 2021-11-02T13:49:57Z

I'm removing the "good first issue" tag because I think there isn't such a scheduler on Core and more thorough investigation would be needed to resolve. Perhaps coordinating with Core to add it is worth it but that's not a great Bootcamp task.

federicopozzi33 · 2022-07-29T22:15:36Z

Hi guys,

I'm working on this issue as reported here. However, I think I need to know some extra information about the expected behavior of the scheduler.

So far, I have considered the following resources:

Let's see one by one.

I'm going to fix some parameters, to make a fair comparison.

lr = 1e-3
end_learning_rate = 1e-4
max_decay_step = 4
power = 1.0

data_loader = range(0, 5)

lr_scheduler = torch.optim.lr_scheduler.LambdaLR(
  torch.optim.SGD([v], lr=lr),
  lambda step: (1 - step / len(data_loader)) ** 1.0
)

>>> for i in data_loader:
>>>    lr_scheduler.step(i)

>>>    print(i, optimizer.param_groups[0]['lr'])

0 0.001
1 0.0008
2 0.0006
3 0.0004
4 0.00019999999999999996

poly = tf.keras.optimizers.schedules.PolynomialDecay(
    lr,
    max_decay_step,
    end_learning_rate= end_learning_rate,
    power=power,
    cycle=False,
    name=None
)

>>> for i in range(0, 5):
>>>    print(i, poly(i))

0 tf.Tensor(0.001, shape=(), dtype=float32)
1 tf.Tensor(0.00077499996, shape=(), dtype=float32)
2 tf.Tensor(0.00055, shape=(), dtype=float32)
3 tf.Tensor(0.000325, shape=(), dtype=float32)
4 tf.Tensor(1e-04, shape=(), dtype=float32)

scheduler = PolynomialLRDecay(
     torch.optim.SGD([torch.zeros(10)], lr=lr),
    max_decay_steps=max_decay_steps,
    end_learning_rate= end_learning_rate,
    power=power,
)

>>> for i in range(0, 5):
>>>    scheduler.step()
>>>    print(i, optim.param_groups[0]['lr'])

0 0.00055
1 0.000325
2 0.0001
3 0.0001
4 0.0001

>>> for i in range(0, 5):
>>>    scheduler.step(i)
>>>    print(i, optim.param_groups[0]['lr'])

0 0.0007750000000000001
1 0.0007750000000000001
2 0.00055
3 0.000325
4 0.0001

Open issues:

I have noticed that scheduler.step(epoch) is being/going to be deprecated. Right now, it's handled by the _get_closed_form_lr method, if available. Should we continue to support it? Moreover, the behavior of scheduler.step(epoch) and scheduler.step() should be the same, right?
Looking at the implementation of _LRScheduler, it seems that a step is performed just by instantiating the scheduler. This means that we're like "skipping" one learning rate decay value. Is this what we want?
Considering the above example, what are the expected/correct LR values?

@datumbox

datumbox · 2022-08-02T14:42:43Z

@federicopozzi33 These are all very good questions. Unfortunately I wasn't too familiar with the API of Schedulers so in order to answer them I had to implement it and experiment.

Here is the proposed implementation:

import warnings

import torch
from torch.optim.lr_scheduler import _LRScheduler


class PolynomialLR(_LRScheduler):
    def __init__(self, optimizer, total_iters=5, min_lr=0.0, power=1.0, last_epoch=-1, verbose=False):
        self.total_iters = total_iters

        if isinstance(min_lr, list) or isinstance(min_lr, tuple):
            if len(min_lr) != len(optimizer.param_groups):
                raise ValueError("expected {} min_lrs, got {}".format(len(optimizer.param_groups), len(min_lr)))
            self.min_lrs = list(min_lr)
        else:
            self.min_lrs = [min_lr] * len(optimizer.param_groups)

        self.power = power
        super().__init__(optimizer, last_epoch, verbose)

    def get_lr(self):
        if not self._get_lr_called_within_step:
            warnings.warn(
                "To get the last learning rate computed by the scheduler, " "please use `get_last_lr()`.", UserWarning
            )

        if self.last_epoch == 0:
            return [group["lr"] for group in self.optimizer.param_groups]

        if self.last_epoch > self.total_iters:
            return [self.min_lrs[i] for i in range(len(self.optimizer.param_groups))]

        return [
            self.min_lrs[i]
            + ((1.0 - self.last_epoch / self.total_iters) / (1.0 - (self.last_epoch - 1) / self.total_iters))
            ** self.power
            * (group["lr"] - self.min_lrs[i])
            for i, group in enumerate(self.optimizer.param_groups)
        ]

    def _get_closed_form_lr(self):
        return [
            (
                self.min_lrs[i]
                + (1.0 - min(self.total_iters, self.last_epoch) / self.total_iters) ** self.power
                * (base_lr - self.min_lrs[i])
            )
            for i, base_lr in enumerate(self.base_lrs)
        ]


# Test it
lr = 0.001
total_iters = 5
power = 1.0

scheduler = PolynomialLR(
    torch.optim.SGD([torch.zeros(1)], lr=lr),
    total_iters=total_iters,
    min_lr=0.0,  # Using 0 because the Lambda doesn't support this option
    power=power,
)
scheduler2 = torch.optim.lr_scheduler.LambdaLR(
    torch.optim.SGD([torch.zeros(1)], lr=lr), lambda step: (1 - step / total_iters) ** power
)


for i in range(0, total_iters):
    print(i, scheduler.optimizer.param_groups[0]["lr"], scheduler2.optimizer.param_groups[0]["lr"])
    scheduler.step()
    scheduler2.step()

Here are some answers to your questions:

Yes indeed. The safest approach is to inherit step from _LRScheduler and get around the problem all together.
It seemed that the first iteration was skipped because you were printing AFTER the step. The step() is called at the very end and is the one that updates the LR value. The problem was previously masked when you passed explicitly the epoch value.
The expected LR values are from LambdaLR. The above implementation produces the expected output:

0 0.001 0.001
1 0.0008 0.0008
2 0.0006 0.0006
3 0.0004 0.0004
4 0.00019999999999999996 0.00019999999999999996

Though I think we can use the above implementation as-is, to be able to contribute it to PyTorch core we need tests, docs and a few more bells and whistles. I believe the PR pytorch/pytorch#60836 is a good example of what needs to be done. If you are up for it, you can start a PR and I can help you get it merged. Alternatively, I can finish it off and find you a different primitive. Let me know what you prefer.

federicopozzi33 · 2022-08-03T19:32:56Z

Hi @datumbox,

thank you for your help.

I have some doubts about the meaning of min_lr. If I've correctly understood, it has two meanings:

It is the lower bound on the learning rate, i.e. LR will never be lower than min_lr.
LR is set to min_lr if last_epoch > total_iters.

I didn't find any references for some parts of the formula you used for the decayed LR. Although the values seem correct to me, I have some doubts about the part:

((1.0 - self.last_epoch / self.total_iters) / (1.0 - (self.last_epoch - 1) / self.total_iters))

Could you explain me better?

It seemed that the first iteration was skipped because you were printing AFTER the step. The step() is called at the very end and is the one that updates the LR value. The problem was previously masked when you passed explicitly the epoch value.

Ok, I get what you mean, but I was referring to this.

Though I think we can use the above implementation as-is, to be able to contribute it to PyTorch core we need tests, docs and a few more bells and whistles. I believe the PR pytorch/pytorch#60836 is a good example of what needs to be done. If you are up for it, you can start a PR and I can help you get it merged. Alternatively, I can finish it off and find you a different primitive. Let me know what you prefer.

Yeah, I'm putting the pieces together . I will open a PR soon.

datumbox · 2022-08-03T19:40:30Z

Correct the min_lr is the minimum permitted value for LR. I'm not 100% we have to support this TBH. Let's see what the Core team says and if there are any weird interactions we should keep in mind.

Although it seems correct to me, I have some doubts about the part:

The API of Schedulers is a bit weird. The changes on the get_lr() happen in place so you need to undo the update of the previous epoch and apply the new one.

Yeah, I'm putting the pieces together . I will open a PR soon.

Sounds good, make sure you tag me on the PR.

federicopozzi33 · 2022-08-11T14:40:31Z

The scheduler has been implemented (see pytorch/pytorch#82769).

It remains only to update the segmentation training script using the newly implemented scheduler as soon as a new version of PyTorch is released.

datumbox · 2022-08-11T14:44:19Z

That's correct. In fact once the Scheduler makes it to the nightly, we can make the change. Not sure if it made it to the one today or if it will appear tomorrow, but you can start a PR and I'll review/test/merge soon. Would that work for you?

fmassa added enhancement module: reference scripts good first issue labels Sep 17, 2021

fmassa mentioned this issue Sep 17, 2021

Warmup schedulers in References #4411

Merged

4 tasks

datumbox removed the good first issue label Nov 2, 2021

This was referenced Feb 13, 2022

[RFC] Batteries Included - Phase 2 #5410

Closed

[RFC] New Ops in TorchVision #5414

Open

This was referenced Jul 27, 2022

Suggest for a polynominal lr_scheduler pytorch/pytorch#79511

Closed

[RFC] Batteries Included - Phase 3 #6323

Open

federicopozzi33 mentioned this issue Aug 12, 2022

refactor: replace LambdaLR with PolynomialLR in segmentation training script #6405

Merged

datumbox closed this as completed in #6405 Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate if lr_scheduler from segmentation can use PyTorch's schedulers #4438

Investigate if lr_scheduler from segmentation can use PyTorch's schedulers #4438

fmassa commented Sep 17, 2021 •

edited by pytorch-probot bot

Loading

avijit9 commented Sep 29, 2021 •

edited

Loading

datumbox commented Sep 29, 2021

fmassa commented Sep 29, 2021

datumbox commented Nov 2, 2021

federicopozzi33 commented Jul 29, 2022 •

edited

Loading

datumbox commented Aug 2, 2022 •

edited

Loading

federicopozzi33 commented Aug 3, 2022 •

edited

Loading

datumbox commented Aug 3, 2022

federicopozzi33 commented Aug 11, 2022 •

edited

Loading

datumbox commented Aug 11, 2022

Investigate if lr_scheduler from segmentation can use PyTorch's schedulers #4438

Investigate if lr_scheduler from segmentation can use PyTorch's schedulers #4438

Comments

fmassa commented Sep 17, 2021 • edited by pytorch-probot bot Loading

avijit9 commented Sep 29, 2021 • edited Loading

datumbox commented Sep 29, 2021

fmassa commented Sep 29, 2021

datumbox commented Nov 2, 2021

federicopozzi33 commented Jul 29, 2022 • edited Loading

datumbox commented Aug 2, 2022 • edited Loading

federicopozzi33 commented Aug 3, 2022 • edited Loading

datumbox commented Aug 3, 2022

federicopozzi33 commented Aug 11, 2022 • edited Loading

datumbox commented Aug 11, 2022

fmassa commented Sep 17, 2021 •

edited by pytorch-probot bot

Loading

avijit9 commented Sep 29, 2021 •

edited

Loading

federicopozzi33 commented Jul 29, 2022 •

edited

Loading

datumbox commented Aug 2, 2022 •

edited

Loading

federicopozzi33 commented Aug 3, 2022 •

edited

Loading

federicopozzi33 commented Aug 11, 2022 •

edited

Loading