Skip to content

预训练数据

zhezhaoa edited this page Aug 25, 2023 · 2 revisions

CLUECorpusSmall

CLUECorpusSmall包含新闻、社区互动、维基百科、评论语料。原始数据和细节描述在这里

语料 链接
CLUECorpusSmall https://share.weiyun.com/sC6PMhxx
CLUECorpusSmall (BERT格式) https://share.weiyun.com/9SPPGUOK

News Commentary v13 (ZH-EN)

News Commentary v13包括平行语料。原始数据和细节描述在这里

语料 链接
news-Commentary-v13-en-zh https://share.weiyun.com/PLMxw6ae
news-Commentary-v13-zh-en https://share.weiyun.com/5rMwRhDi
news-Commentary-v13-en-zh_sampled https://share.weiyun.com/1KTxq3Dc

CIFAR100_nolabel

CIFAR100_nolabel 包括50000张没有标注的图片,可以用作无监督的预训练。原始数据在这里

语料 链接
CIFAR100_nolabel https://share.weiyun.com/M2tA9P8p
Clone this wiki locally