COS960

COS960 is a Chinese word similarity dataset of 960 word pairs. Each pair of words is annotated by 15 native speakers with a similarity score which reflects true similarity. The 960 word pairs are further divided into 3 groups according to their Part Of Speech tags, including 480 pairs of nouns, 240 pairs of verbs and 240 pairs of adjectives.

Usage

To use COS960 to test your word embedding, use command

python correlation_calcu.py {VECTOR_FILE}

Dataset

The data in the files is formulated as

[Word1] [Word2] [Average] [Annotator1] ... [Annotator15]

小心谨慎  谨慎小心     4.0         4      ...       4

Cite

If you use the dataset, please cite this:

@article{huang2019COS960,
Author = {Junjie Huang and Fanchao Qi and Chenghao Yang and Zhiyuan Liu and Maosong Sun},
Title = {{COS960: A Chinese Word Similarity Dataset of 960 Word Pairs}},
journal={arXiv preprint arXiv:1906.00247},
Year = {2019},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
COS960_adj.txt		COS960_adj.txt
COS960_all.txt		COS960_all.txt
COS960_noun.txt		COS960_noun.txt
COS960_verb.txt		COS960_verb.txt
LICENSE		LICENSE
README.md		README.md
correlation_calcu.py		correlation_calcu.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COS960

Usage

Dataset

Cite

About

Releases

Packages

Contributors 2

Languages

License

thunlp/COS960

Folders and files

Latest commit

History

Repository files navigation

COS960

Usage

Dataset

Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages