This repository contains scripts and datasets used in the study with the name above.
- data/: contains the SimLex-999 dataset, and cleaned data collected from human participants.
- embeddings/: contains word embeddings trained on a subset of the Corpus of Contemporary American English (COCA) and an SgE corpus collated by Lin et al. (2022).
- scripts/: contains Python scripts used in data cleaning and analysis.- clean/: scripts used in data cleaning.
- compare/: scripts used in data analysis.
- train/: scripts used in the training of word embeddings.