Skip-Gram with Negative Sampling (PyTorch)

Mapping semantically similar words into closer locations in the embedding space.

Loss

Using Negative Sampling (drawing random noise words to form incorrect target pairs), the model tries to minimize the following Loss Function:

This repository contains:

SkipGram_NegativeSampling.py : Contains the complete source code for pre-processing and batching data, building the model, training the model, and visualizing the resulting word embeddings
- util.py : Contains utility functions for text pre-processing
data/text8.txt : Contains the training text
SkipGram_NegativeSampling.ipynb : Step-by-step Colab Notebook for pre-processing and batching data, building the model, training the model, and visualizing the resulting word embeddings

Number of Center Words in a Batch = 512
- The actual Batch Size will vary, since there will be a varying number (in range [1, single_window_size]) of context words for each center word
Threshold for Subsampling = 1e-5
Single-side Window Size for Context = 5
- So the whole window contains 5*2+1 = 11 words
Embedding Dimension = 300
Number of Negative (Noise) Samples Per Center Word = 5
Learning Rate = 0.003
Number of Training Epochs = 5

I referenced Udacity for building & debugging the final model :

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
.gitattributes		.gitattributes
README.md		README.md
SkipGram_NegativeSampling.ipynb		SkipGram_NegativeSampling.ipynb
SkipGram_NegativeSampling.py		SkipGram_NegativeSampling.py
loss_function.png		loss_function.png
utils.py		utils.py