Knowledge Distillation

PyTorch implementations of algorithms for knowledge distillation.

Setup

$ docker build -t kd -f Dockerfile .

$ docker run -v local_data_path:/data -v project_path:/app -p 0.0.0.0:8084:8084 -it kd

Cristian Bucila, Rich Caruana, Alexandru Niculescu-Mizil "ModelCompression" (2006) pdf.
Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter" (2019) https://arxiv.org/abs/1910.01108.
Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin "Distilling Task-Specific Knowledge from BERT into Simple Neural Networks" (2019) https://arxiv.org/abs/1903.12136.
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations" (2019) https://arxiv.org/abs/1909.11942.
Rafael Müller, Simon Kornblith, Geoffrey Hinton "Subclass Distillation" (2020) https://arxiv.org/abs/2002.03936.
Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova "Well-Read Students Learn Better: On the Importance of Pre-training Compact Models" (2020) https://arxiv.org/abs/1908.08962.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
experiments/sst2		experiments/sst2
knowledge_distillation		knowledge_distillation
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt