The inconsistency in the behaviour of different versions of an AI module may bringsignificant instability to the overall system. While an improved module usuallyreduces the average number of errors, it always introduces new ones when compared to its predecessor. This phenomenon is known as regression.
This repository holds the codebase for exploring regression problem in classification task. It provides training scripts for mainstream network architectures and analysis tools for evaluating regression extent among them.
Regression in model update: When updating an old classifier (red) to a new one (dashed blue line), we correct
mistakes (top-right, white), but we also introduce errors that the old classifier did not make (negative flips, bottom-left, red). While on average the errors decrease (from 57% to 42% in this toy example), regression can wreak havoc with downstream processing, nullifying the benefit of the update.
- Install Anaconda (with python3.7)
- Install the dependencies with
pip install -r ConstrainedUpgrade/requirements.txt
- Download the ImageNet dataset from
- Then, and move validation images to labeled subfolders, using this shell script.
To train a model, run ConstrainedUpgrade/
with the desired model architecture and the path to the ImageNet dataset:
python ConstrainedUpgrade/ -a resnet18 [imagenet-folder with train and val folders]
The default learning rate schedule starts at 0.1 and decays by a factor of 10 every 30 epochs. This is appropriate for ResNet and models with batch normalization, but too high for AlexNet and VGG. Use 0.01 as the initial learning rate for AlexNet or VGG:
python ConstrainedUpgrade/ -a alexnet --lr 0.01 [imagenet-folder with train and val folders]
You should always use the NCCL backend for multi-processing distributed training since it currently provides the best distributed training performance.
python ConstrainedUpgrade/ -a resnet50 --dist-url 'tcp://' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders]
Node 0:
python ConstrainedUpgrade/ -a resnet50 --dist-url 'tcp://IP_OF_NODE0:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 2 --rank 0 [imagenet-folder with train and val folders]
Node 1:
python ConstrainedUpgrade/ -a resnet50 --dist-url 'tcp://IP_OF_NODE0:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 2 --rank 1 [imagenet-folder with train and val folders]
The best model in training process will be evaluated automatically. The prediction outputs on validation set will be saved in the working folder --work_dir
You can also evaluate model after training by:
python ConstrainedUpgrade/ --evaluate --resume $MODEL_PATH [other options]
. The results will be stored as evaluate.result
in the working folder by default.
We also provide anylisis tools to calucate the statistic numbers such as accuracy and negative flip rate. Please refer to ConstrainedUpgrade\analysis
. There is an example code snippet:
from analysis.utils import ModelAnalyzer
old_model = ModelAnalyzer('{}/model_best.result'.format(work_dir_1))
new_model = ModelAnalyzer('{}/model_best.result'.format(work_dir_2))
ensemble_model = old_model + new_model
print('Accuracy: {}, NFR: {}'.format(new_model.Acc(), new_model.NFR(old_model)))
print('Ensemble Accuracy: {}'.format(ensemble_model.Acc()))
Use focal distillation in training with Pytorch DDP. It automatically uses all GPUs available on a node. The KD loss tmperature is set to 100 and alpha=1, beta=5
python $BASEDIR/ConstrainedUpgrade/ \
--dist-url 'tcp://' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \
--data ~/resource/imagenet/ --work_dir $SCRIPTDIR --seed 4331 \
--auto-scale --workers 2 --batch-size 160 --lr 0.1 --lr_step 30 --epochs 90 \
-a resnet18 \
--kd_model_arch resnet18 \
--kd_model_path FOLDER_OF_RESNET18/model_best.pth.tar \
--kd_loss_weight 1 --kd_alpha 0.9 --kd_loss_mode kl --kd_temperature 100 --kd_filter old_correct --filter-base 1 --filter-scale 5 \
2>&1 | tee -a $SCRIPTDIR/log.txt
Use the CNA in training with the Pytorh DDP. The CNA loss is set to temperature 0.01. By default it uses the outside log sum formulation. Beta=0 for the focal distillation.
python $BASEDIR/ConstrainedUpgrade/ \
--dist-url 'tcp://' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \
--data ~/resource/imagenet/ --work_dir $SCRIPTDIR --seed 4331 \
--auto-scale --workers 2 --batch-size 160 --lr 0.1 --lr_step 30 --epochs 90 \
-a resnet18 \
--kd_model_arch resnet18 \
--kd_model_path FOLDER_OF_RESNET18/model_best.pth.tar \
--kd_loss_weight 1 --kd_alpha 0.9 --kd_loss_mode cna --cna_temperature 0.01 --kd_temperature 100 --kd_filter old_correct --filter-base 1 --filter-scale 0 \
2>&1 | tee -a $SCRIPTDIR/log.txt
Use the LDI in training with the PyTorch DDP. By default, we use LDI margin of 0.5, and p of 2. We can additional set "--li_compute_topk 10" to calculate the classes whose new logits are ranked highest.
python $BASEDIR/ConstrainedUpgrade/ \
--dist-url 'tcp://' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \
--data ~/resource/imagenet/ --work_dir $SCRIPTDIR --seed 1 \
--auto-scale --workers 4 --batch-size 128 --lr 0.1 --lr_step 30 --epochs 90 \
-a resnet50 \
--kd_model_arch resnet18 \
--kd_model_path FOLDER_OF_RESNET18/model_best.pth.tar \
--kd_loss_weight 1 --kd_alpha 0.5 --kd_loss_mode li --kd_filter all_pass \
--li_p 2 --li_margin 0.5 \
2>&1 | tee -a $SCRIPTDIR/log.txt
Use Ensemble Distillation with LDI in training with the PyTorch DDP. Compared with single model+LDI, we set a smaller margin (--li_margin 0.2 or even 0) nad a larger KD loss weight (--kd_alpha 0.8).
python $BASEDIR/ConstrainedUpgrade/ \
--dist-url 'tcp://' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \
--data ~/resource/imagenet/ --work_dir $SCRIPTDIR --seed 1 \
--auto-scale --workers 4 --batch-size 80 --lr 0.1 --lr_step 30 --epochs 90 \
-a resnet50 --kd_model_arch resnet50 \
--kd_model_path \
FOLDER_OF_RESNET18@SEED=1/model_best.pth.tar FOLDER_OF_RESNET18@SEED=2/model_best.pth.tar \
FOLDER_OF_RESNET18@SEED=3/model_best.pth.tar FOLDER_OF_RESNET18@SEED=4/model_best.pth.tar \
FOLDER_OF_RESNET18@SEED=5/model_best.pth.tar FOLDER_OF_RESNET18@SEED=6/model_best.pth.tar \
FOLDER_OF_RESNET18@SEED=7/model_best.pth.tar FOLDER_OF_RESNET18@SEED=8/model_best.pth.tar \
--kd_loss_weight 1 --kd_alpha 0.8 --kd_loss_mode li --kd_filter all_pass \
--li_p 2 --li_margin 0.0 \
--li_exclude_gt \
--save-init-checkpoint \
--epochs_per_save 1 \
2>&1 | tee -a $SCRIPTDIR/log.txt
If this code helps your research or project, please cite
title={Positive-congruent training: Towards regression-free model updates},
author={Yan, Sijie and Xiong, Yuanjun and Kundu, Kaustav and Yang, Shuo and Deng, Siqi and Wang, Meng and Xia, Wei and Soatto, Stefano},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
title={ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training},
author={Zhao, Yue and Shen, Yantao and Xiong, Yuanjun and Yang, Shuo and Xia, Wei and Tu, Zhuowen and Shiele, Bernt and Soatto, Stefano},
journal={arXiv preprint arXiv:2205.06265},
title={Contrastive Neighborhood Alignment},
author={Zhu, Pengkai and Cai, Zhaowei and Xiong, Yuanjun and Tu, Zhuowen and Goncalves, Luis and Mahadevan, Vijay and Soatto, Stefano},
journal={arXiv preprint arXiv:2201.01922},
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.
