Skip to content

jsigee87/word-embedding-testing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word Embedding Testing

There are three files here. One of them downloads and sets up the tests that you need. Another runs the tests. You must provide a pickled python dictionary for your embeddings. The keys should be strings, and the values should be arrays of floats or ints. The final file is a log file for the scores from the GloVe embeddings.

Recorded scores are the squared error between the similarity proposed in the tests and the similarity of the word pairs in the embeddings you provide. In theory, a lower score would indicate 'better' embeddings. The GloVe embeddings are provided as reference, as scores are more meaninful in a relative sense than an absolute sense.


Installation

Run download.sh


Usage

Run test_embeddings.py <path to dict> <log file name>


Tests

Here are the links to the tests that are used.

MEN: http://clic.cimec.unitn.it/~elia.bruni/MEN.html

MTurk: http://www2.mta.ac.il/~gideon/mturk771.html

WS-353 http://alfonseca.org/eng/research/wordsim353.html

SimLex http://www.cl.cam.ac.uk/~fh295/simlex.html#


GloVe

Here are the GloVe embeddings used as a baseline.

https://nlp.stanford.edu/projects/glove/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published