ngram-word2vec

This is a skip-gram Word2Vec model that trains on ngram data. Unlike the original implementation, which takes a corpus as input, this implementation takes a n-gram file instead. Minimum changes were made to the original Google implementation

Word2Vec Tutorial

whose model is described in:

(Mikolov, et. al.) Efficient Estimation of Word Representations in Vector Space, ICLR 2013.

n-grams are co-occurrences of words in a corpus. For example, given a sentence "the quick brown fox jumps over the lazy dog", the 5-grams contained are

5-gram	counts
`the quick brown fox jumps`	1
`quick brown fox jumps over`	1
`brown fox jumps over the`	1
`fox jumps over the lazy`	1
`jumps over the lazy dog`	1

The ngram file is tab separated. The first part is the ngram itself, and second is its count. It's allowed that the input file contains mixed ngrams. It's possible to put 3-grams and 5-grams together in one input file. We also included an example on how to use this on the Google Books Ngram Dataset

We have a pre-trained set of vectors for years between 1800-2008, which can be found here. We threshold the word min_count to be 100. Here is a short overview of what is in this directory.

Directory	What's in it?
`scripts`	The scripts for getting and processing the Google Ngram data.
`distributed_train`	Training Word2Vec models on multiple machines (if you have)
`word2vec`	The source code of the ngram word2vec, modified from Tensorflow source codes

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
distributed_train		distributed_train
scripts		scripts
word2vec		word2vec
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ngram.JPG		ngram.JPG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ngram-word2vec

About

Releases

Packages

Languages

License

ziyin-dl/ngram-word2vec

Folders and files

Latest commit

History

Repository files navigation

ngram-word2vec

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages