This work formed my MSc Dissertation at The University of Edinburgh. A copy of the dissertation can be found here. The work investigates ways to go about developing a teacher network for distillation when given a small, non-standard student network, typically developed through Neural Architecture Search. My method involved using Fisher information to determine which blocks of the student network to scale, developing a teacher network from the student. The student can then be trained by this new teacher via attention transfer, or knowledge distillation.
activations.py -- A helper script for producing visualisations of a models activations
fisher_expand.py --- Code for expanding a given model using our Fisher expansion algorithm
funcs.py --- Functions used throughout the codebase
main.py --- Main script for performing distillation
model.py --- DARTS model code
operations.py --- DARTS operations code
utils.py --- Extra utility functions
python fisher_expand.py cifar10 --data_loc <cifar location> --base_model <model file>
python main.py cifar10 -t <teacher checkpoint> --teach_arch <darts|densenet|wrn>
python main.py cifar10 -s <student checkpoint> --student_arch <darts|densenet|wrn> --teacher_arch <darts|densenet|wrn>
The following repos provided basis and inspiration for this work
https://github.com/BayesWatch/xdistill
https://github.com/quark0/darts
https://github.com/BayesWatch/pytorch-blockswap
https://github.com/szagoruyko/attention-transfer
https://github.com/kuangliu/pytorch-cifar
https://github.com/xternalz/WideResNet-pytorch
https://github.com/ShichenLiu/CondenseNet