Skip to content

Commit

Permalink
add dataset info
Browse files Browse the repository at this point in the history
  • Loading branch information
pascalkeilbach committed Dec 12, 2023
1 parent 987a8d2 commit dd5c4fe
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions notebooks/vector_space_models.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,19 @@
"from htwgnlp.embeddings import WordEmbeddings"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The embeddings we use for this lab are from the [Google News Word2Vec model](https://code.google.com/archive/p/word2vec/). This model was trained on part of the Google News dataset (about 100 billion words). \n",
"\n",
"The model contains 300-dimensional vectors for 3 million words and phrases and is about 3.5GB large.\n",
"\n",
"For this notebook, we use a small subset of 243 words, which were selected beforehand and are stored in the pickle file `data/embeddings.pkl`.\n",
"\n",
"Besides some sample words, it contains mostly capitals and countries. We will use the embeddings to find analogies between words."
]
},
{
"cell_type": "code",
"execution_count": 2,
Expand Down

0 comments on commit dd5c4fe

Please sign in to comment.