BatchNormalization meddles in finetuning schedule. #5

JohannesK14 · 2022-10-11T09:11:49Z

🐛 Bug

Hi @speediedan, thanks for creating this fine-tuning callback!

I want to fine-tune the torchvision.models.segmentation.deeplabv3_resnet50 dataset, which contains batch normalization parameters. As a first step, I would like to fine-tune in a simple two-step approach. Starting with the classification part and continue with the backbone. When I run the attached script finetuning-scheduler issues the following warning:

UserWarning: FinetuningScheduler configured the provided model to have 23 trainable parameters in phase 0 (the initial training phase) but the optimizer has subsequently been initialized with 129 trainable parameters. If you have manually added additional trainable parameters you may want to ensure the manually added new trainable parameters do not collide with the 182 parameters FinetuningScheduler has been scheduled to thaw in the provided schedule.

I think this is caused by freeze_before_training calling freeze which has a parameter train_bn which defaults to True and therefore sets requires_grad to True for every batch normalization parameter.

Downstream the code crashes because of parameters appearing in more than one parameter group (though, not in the attached sample, as the data does not fit the model, but it reproduces the initial bug).

To Reproduceimport torch

from torch.utils.data import DataLoader, Dataset

from pytorch_lightning import LightningModule, LightningDataModule
from pytorch_lightning.cli import LightningArgumentParser, LightningCLI
from torchvision.models.segmentation import deeplabv3_resnet50


class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):

    def __init__(self):
        super().__init__()
        
        self.model = deeplabv3_resnet50()

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("valid_loss", loss)

    def test_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("test_loss", loss)

    def configure_optimizers(self):
        params = list(filter(lambda x: x.requires_grad, self.parameters()))
        optimizer = torch.optim.Adam(params, lr=0.1)

        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer, factor=0.5, patience=5)
        return {
            "optimizer": optimizer,
            "lr_scheduler": {
                "scheduler": scheduler,
                "monitor": "valid_loss",
            },
        }


class MyDataModule(LightningDataModule):

    def __init__(self):
        super().__init__()

    def train_dataloader(self) -> DataLoader:
        return DataLoader(RandomDataset(32, 64), batch_size=2)

    def val_dataloader(self) -> DataLoader:
        return DataLoader(RandomDataset(32, 64), batch_size=2)

    def test_dataloader(self) -> DataLoader:
        return DataLoader(RandomDataset(32, 64), batch_size=2)


from finetuning_scheduler import FinetuningScheduler, FTSCheckpoint, FTSEarlyStopping


class BoringCLI(LightningCLI):

    def add_arguments_to_parser(self, parser: LightningArgumentParser) -> None:
        parser.add_lightning_class_args(FinetuningScheduler, 'finetune_scheduler')
        parser.set_defaults({
            'finetune_scheduler.ft_schedule': {
                0: {
                    'params': ['model.classifier.*'],  # the parameters for each phase definition can be fully specified
                    'max_transition_epoch': 2,
                    'lr': 0.001,
                },
                1: {
                    'params': ['model.backbone.*'],
                    'lr': 0.001 
                }
            },
        })

        # EarlyStopping
        parser.add_lightning_class_args(FTSEarlyStopping, "early_stopping")
        early_stopping_defaults = {
            "early_stopping.monitor": "valid_loss",
            "early_stopping.patience": 99999,  # disable early_stopping
            "early_stopping.mode": "min",
            "early_stopping.min_delta": 0.01,
        }
        parser.set_defaults(early_stopping_defaults)

        # ModelCheckpoint
        parser.add_lightning_class_args(FTSCheckpoint, 'model_checkpoint')
        model_checkpoint_defaults = {
            "model_checkpoint.filename": "epoch{epoch}_val_loss{valid_loss:.2f}",
            "model_checkpoint.monitor": "valid_loss",
            "model_checkpoint.mode": "min",
            "model_checkpoint.every_n_epochs": 1,
            "model_checkpoint.save_top_k": 5,
            "model_checkpoint.auto_insert_metric_name": False,
            "model_checkpoint.save_last": True
        }
        parser.set_defaults(model_checkpoint_defaults)


if __name__ == "__main__":
    BoringCLI(BoringModel, MyDataModule, seed_everything_default=False, save_config_overwrite=True)

Expected behavior

Call freeze with train_bn=False, to avoid training the batch normalization parameters by default.

Environment

CUDA:
- GPU:
- NVIDIA GeForce RTX 3090
- available: True
- version: 11.3
Packages:
- finetuning-scheduler: 0.2.3
- numpy: 1.23.3
- pyTorch_debug: False
- pyTorch_version: 1.12.1
- pytorch-lightning: 1.7.7
- tqdm: 4.64.1
System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.10.6
- version: #54~20.04.1-Ubuntu SMP Thu Sep 1 16:17:26 UTC 2022

Additional context

The text was updated successfully, but these errors were encountered:

speediedan · 2022-10-12T02:32:53Z

Thanks for taking the time to file the issue @JohannesK14!

I've reproduced the issue, made the appropriate change in a02b864 and added the relevant test to avoid future reversions in this behavior.

The aforementioned commit should be included in either FTS 0.2.4 (if PL releases another patch version of 1.7) or will be in the first version of FTS 0.3 (0.3.0)

Your clear description and attention to detail in the reproduction really accelerated resolution, a model bug report! 🚀 🎉
Thanks again and glad you've found FTS of at a least some utility. Feel free to reach out anytime if you have other issues or want to share your use case. Best of luck with your work!

JohannesK14 added the bug Something isn't working label Oct 11, 2022

speediedan closed this as completed in a02b864 Oct 12, 2022

speediedan mentioned this issue May 10, 2024

Config option to enable/disable batchnorm running values #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BatchNormalization meddles in finetuning schedule. #5

BatchNormalization meddles in finetuning schedule. #5

JohannesK14 commented Oct 11, 2022 •

edited

Loading

speediedan commented Oct 12, 2022

BatchNormalization meddles in finetuning schedule. #5

BatchNormalization meddles in finetuning schedule. #5

Comments

JohannesK14 commented Oct 11, 2022 • edited Loading

🐛 Bug

To Reproduceimport torch

Expected behavior

Environment

Additional context

speediedan commented Oct 12, 2022

JohannesK14 commented Oct 11, 2022 •

edited

Loading