Overview

This project focuses on comparing the results of Word2Vec using Gensim implementation and finding out the ideal window size and dimensionality for obtaining best similarity results. The similarity results were calculated using Cosine, Eculidean, and Manhattan metrics. The results were then compared with the Goldstandard True Words. Information Retrieval metrics were used such as MAP and NDCG scores to measure the similartity between true words and words from our Simpsons Dataset.

Pre-Requisite

Gensim Library: - !pip install gensim
Pytrec Eval: - !pip install pytrec_eval
NLTK Brown Corpus - from nltk.corpus import brown
Spacy - !pip install spacy
Simpsons Dataset from Kaggle
WordSim GoldStandard Dataset - Download it from http://alfonseca.org/pubs/ws353simrel.tar.gz." and Unzip wordsim_similarity_goldstandard into the evaluation folder.

Running the Code

python main.py

Output

This projects works on finding the optimal hyperparameters set for Word2Vec and then compares it with WordSim GoldStandard True Words. Graphs of simialrit metrics cosine, euclidean and manhattan and iformation retrieval metrics such as map and ndcg is created after a successful execution.

Citation

Pierre Megret - https://www.kaggle.com/pierremegret/gensim-word2vec-tutorial

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.idea		.idea
data		data
evaluation		evaluation
output		output
src		src
Gensim-Word2Vec.ipynb		Gensim-Word2Vec.ipynb
README.md		README.md
Word2Vec_Genism.ipynb		Word2Vec_Genism.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Pre-Requisite

Running the Code

Output

Citation

About

Releases

Packages

Contributors 2

Languages

karan96/Word2Vec-Parameter-Tuning

Folders and files

Latest commit

History

Repository files navigation

Overview

Pre-Requisite

Running the Code

Output

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages