Semantic similarity in Singapore English

This repository contains scripts and datasets used in the study with the name above.

data/: contains the SimLex-999 dataset, and cleaned data collected from human participants.
embeddings/: contains word embeddings trained on a subset of the Corpus of Contemporary American English (COCA) and an SgE corpus collated by Lin et al. (2022).
scripts/: contains Python scripts used in data cleaning and analysis.
- clean/: scripts used in data cleaning.
- compare/: scripts used in data analysis.
- train/: scripts used in the training of word embeddings.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
embeddings		embeddings
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback