Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问一下蒸馏阶段的无监督数据指的是什么? #8

Open
hitnq opened this issue Jul 3, 2020 · 3 comments
Open

请问一下蒸馏阶段的无监督数据指的是什么? #8

hitnq opened this issue Jul 3, 2020 · 3 comments

Comments

@hitnq
Copy link

hitnq commented Jul 3, 2020

README中写到
蒸馏阶段使用无监督数据,可以增强模型的鲁棒性

请问一下这里的无监督数据指的是什么?有具体的例子吗?我看script_train_stage1.sh中使用的数据还是有标签的数据。

希望能帮忙解答一下,谢谢

@BitVoyage
Copy link
Owner

因为蒸馏阶段本质是 学生学习老师的输出,那么可以是任何数据输入老师模型得到学生的学习目标。为了蒸馏效率,推荐使用和任务相关的数据(如待标注数据)来保持类别的均衡性,默认的script_train_stage1.sh直接使用了训练数据。

@caijie12138
Copy link

哈哈哈 make sense
FastBert还是看做是一个半监督的分类器 学习了

@feiyuxiaoThu
Copy link

请问一下,如果补充新的无标注的数据作为蒸馏阶段的数据,那么是将
--train_data 保持不变 --eval_data 更改为无标注数据即可吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants