Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
img		img
.gitignore		.gitignore
Knowledge_Distillation_From_Scratch.ipynb		Knowledge_Distillation_From_Scratch.ipynb
README.md		README.md
knowledge_distillation_bert.py		knowledge_distillation_bert.py
knowledge_distillation_bert_of_theseus.py		knowledge_distillation_bert_of_theseus.py
knowledge_distillation_fastbert.py		knowledge_distillation_fastbert.py

Repository files navigation

知识蒸馏

知识蒸馏（a.k.a Teacher-Student Model)旨在利用一个小模型（Student）去学习一个大模型（Teacher）中的知识，期望小模型尽量保持大模型的性能，来减小模型部署阶段的参数量，加速模型推理速度，降低计算资源使用。

目录结构

1.参考Distilling the Knowledge in a Neural Network (Hinton et al., 2015), 在cifar10数据上的复现，提供一个对Knowledge Distillation的基本认识，具体内容请查阅：Knowledge_Distillation_From_Scratch.ipynb
2.利用BERT-12 作为Teacher，BERT-3作为student，同时学习ground truth 和 soften labels，性能与Teacher 相当甚至更优，具体内容请查阅：knowledge_distillation_bert

主要参考论文：
3.利用模块替换的思路，来进行Knowledge Distillation，具体内容请查阅：knowledge_distillation_bert_of_theseus

论文：
- BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Blog:
- BERT-of-Theseus：基于模块替换的模型压缩方法
- 模型压缩实践系列之——bert-of-theseus，一个非常亲民的bert压缩方法
repo:
- https://github.com/JetRunner/BERT-of-Theseus
- https://github.com/bojone/bert-of-theseus
4.利用不同样本预测的难易程度不同，来动态选择模型的branch classifier，不过由于tensorflow1.X 是静态图，所以当前实现的 demo实际上并不会真的提前结束计算，具体内容请查阅：knowledge_distillation_fastbert

论文：
- FastBERT: a Self-distilling BERT with Adaptive Inference Time

About

some demos of Knowledge Distillation in NLP

nlp keras knowledge-distillation bert

Report repository

Releases

No releases published

Packages

No packages published

Languages