Word Embedding Testing

There are three files here. One of them downloads and sets up the tests that you need. Another runs the tests. You must provide a pickled python dictionary for your embeddings. The keys should be strings, and the values should be arrays of floats or ints. The final file is a log file for the scores from the GloVe embeddings.

Recorded scores are the squared error between the similarity proposed in the tests and the similarity of the word pairs in the embeddings you provide. In theory, a lower score would indicate 'better' embeddings. The GloVe embeddings are provided as reference, as scores are more meaninful in a relative sense than an absolute sense.

Installation

Run download.sh

Usage

Run test_embeddings.py <path to dict> <log file name>

Tests

Here are the links to the tests that are used.

MEN: http://clic.cimec.unitn.it/~elia.bruni/MEN.html

MTurk: http://www2.mta.ac.il/~gideon/mturk771.html

WS-353 http://alfonseca.org/eng/research/wordsim353.html

SimLex http://www.cl.cam.ac.uk/~fh295/simlex.html#

GloVe

Here are the GloVe embeddings used as a baseline.

https://nlp.stanford.edu/projects/glove/

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
logs		logs
utils		utils
README.md		README.md
download.sh		download.sh
test_embeddings.py		test_embeddings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Word Embedding Testing

Installation

Usage

Tests

GloVe

About

Uh oh!

Releases

Packages

Languages

jsigee87/word-embedding-testing

Folders and files

Latest commit

History

Repository files navigation

Word Embedding Testing

Installation

Usage

Tests

GloVe

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages