wordspace

Visualize word embeddings in a terminal window.

The symbol ┼ indicates the position of the central word, and words scattered around it are the top n most similar words by cosine similarity, positioned according to their reduced 2D projection relative to the central word.

This package reduces word vectors (loaded from Gensim) to two dimensions by getting the singular value decomposition of the embeddings matrix and approximating the matrix using its first two left singular vectors along with its values. If the first two singular values are not much larger than the other singular values, then the 2D approximation will be less reliable. Due to this, the words mapped in the TUI use a custom score that rewards high cosine similarity with the chosen word before dimensionality reduction, and words that don't have their performances (defined by closeness to the central word) greatly affected after reduction.

Experimentally, the performance before reduction has been the most important metric for generating quality close words after reduction, and the affect of performance drift is made tuneable and recommended to be kept small.

Getting Started

To run, make sure Python and Go runtimes are installed on your machine. Then, install the package from source.

git clone https://github.com/sangstar/wordspace.git
cd wordspace
pip install -r requirements.txt
python -m wordspace -n 20 -x 10 -y 10 -a 0.05

Once entering the first word to visualize embeddings for, they can be regenerated for any new word by inputting it in the terminal.

The available CLI args are:

optional arguments:
  -h, --help            show this help message and exit
  --model MODEL         The name of the Gensim model to load
  --output_dim OUTPUT_DIM
                        The output dimension for SVD compression
  -n N, --n N           The number of most similar words to display
  -x X_WINDOW_SIZE, --x-window-size X_WINDOW_SIZE
                        The x window size for the TUI
  -y Y_WINDOW_SIZE, --y-window-size Y_WINDOW_SIZE
                        The y window size for the TUI
  -w NUM_WORKERS, --num-workers NUM_WORKERS
                        The number of workers to use
  -a ALPHA, --alpha ALPHA
                        A scalar to influence the impact of words that 
                        do not change much in terms of cosine similarity 
                        with the central word before and after singular
                        value decomposition. Usually best to keep the 
                        value very low

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
gosrc/tui		gosrc/tui
tui		tui
wordspace		wordspace
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wordspace

Getting Started

About

Releases

Packages

Languages

sangstar/wordspace

Folders and files

Latest commit

History

Repository files navigation

wordspace

Getting Started

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages