- Huggingface, datasets
- Awesome-Chinese-NLP, Chinese
- CLUEDatasetSearch, Chinese
- funNLP, Chinese
- ChineseNLPCorpus1, Chinese
- ChineseNLPCorpus2, Chinese
- CLUE, Chinese
- Chinese NLP data by ShannonAI, Chinese
- nlp-datasets, Multilingual
- awesome-nlp, Multilingual
- various NER dataset
- CoNLL-2003, Offical, CoNLL-2003, other link
- WNUT-2016, Twitter
- OntoNotes-5.0, broadcase news, braodcase conversation, weblogs, magzine genre
- Wikigold
- kaggle
- MUC6
- MUC7
- WMT 2020
- AI challenger (英中翻译规模最大的口语领域英中双语对照数据集)
- UM-Corpus: A Large English-Chinese Parallel Corpus
- OpenSubtitles2016
- MultiUN