Skip to content

v1.2.0

Compare
Choose a tag to compare
@shibing624 shibing624 released this 16 Jun 05:42
· 108 commits to master since this release

v1.2.0版本

  • 发布了中文匹配模型shibing624/text2vec-base-chinese-nli,基于ERNIE-3.0-base模型,使用了中文NLI数据集shibing624/nli_zh全部语料训练的CoSENT文本匹配模型,在各评估集表现提升明显。

  • 发布了2个中文NLI数据集:shibing624/snli-zh 和 shibing624/nli-zh-all

  • 本项目release模型的中文匹配评测结果:

Arch BaseModel Model ATEC BQ LCQMC PAWSX STS-B Avg QPS
Word2Vec word2vec w2v-light-tencent-chinese 20.00 31.49 59.46 2.57 55.78 33.86 23769
SBERT xlm-roberta-base sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 18.42 38.52 63.96 10.14 78.90 41.99 3138
CoSENT hfl/chinese-macbert-base shibing624/text2vec-base-chinese 31.93 42.67 70.16 17.21 79.30 48.25 3008
CoSENT hfl/chinese-lert-large GanymedeNil/text2vec-large-chinese 32.61 44.59 69.30 14.51 79.44 48.08 2092
CoSENT nghuyong/ernie-3.0-base-zh shibing624/text2vec-base-chinese-nli 51.26 68.72 79.13 34.28 80.70 62.81 3066
  • 本项目release的数据集:
Dataset Introduce Download Link
shibing624/nli-zh-all 中文语义匹配数据合集,整合了文本推理,相似,摘要,问答,指令微调等任务的820万高质量数据,并转化为匹配格式数据集 https://huggingface.co/datasets/shibing624/nli-zh-all
shibing624/snli-zh 中文SNLI和MultiNLI数据集,翻译自英文SNLI和MultiNLI https://huggingface.co/datasets/shibing624/snli-zh
shibing624/nli_zh 中文语义匹配数据集,整合了中文ATEC、BQ、LCQMC、PAWSX、STS-B共5个任务的数据集 https://huggingface.co/datasets/shibing624/nli_zh
or
百度网盘(提取码:qkt6)
or
github
  • 基于更大数据集shibing624/nli-zh-all的CoSENT匹配模型在训练中。

Full Changelog: 1.1.8...1.2.0