Skip to content

Commit

Permalink
Update Warm-start Embedding Layer Matrix tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
8bitmp3 authored Jul 14, 2023
1 parent fde5650 commit f3ba9bd
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions site/en/tutorials/text/warmstart_embedding_matrix.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@
"source": [
"### Vocabulary\n",
"\n",
"The set of unique words is referred to as the vocabulary. To build a text model you need to choose a fixed vocabulary. Typically you you build the vocabulary from the most common words in a dataset. The vocabulary allows us to represent each piece of text by a sequence of ID's that you can lookup in the embedding matrix. Vocabulary allows us to represent each piece of text by the specific words that appear in it."
"The set of unique words is referred to as the vocabulary. To build a text model you need to choose a fixed vocabulary. Typically you build the vocabulary from the most common words in a dataset. The vocabulary allows us to represent each piece of text by a sequence of ID's that you can lookup in the embedding matrix. Vocabulary allows us to represent each piece of text by the specific words that appear in it."
]
},
{
Expand All @@ -104,7 +104,7 @@
"\n",
"A model is trained with a set of embeddings that represents a given vocabulary. If the model needs to be updated or improved you can train to convergence significantly faster by reusing weights from a previous run. Using the embedding matrix from a previous run is more difficult. The problem is that any change to the vocabulary invalidates the word to id mapping.\n",
"\n",
"The `tf.keras.utils.warmstart_embedding_matrix` solves this problem by creating an embedding matrix for a new vocabulary from an embedding martix from a base vocabulary. Where a word exists in both vocabularies the base embedding vector is copied into the correct location in the new embedding matrix. This allows you to warm-start training after any change in the size or order of the vocabulary."
"The `tf.keras.utils.warmstart_embedding_matrix` solves this problem by creating an embedding matrix for a new vocabulary from an embedding matrix from a base vocabulary. Where a word exists in both vocabularies the base embedding vector is copied into the correct location in the new embedding matrix. This allows you to warm-start training after any change in the size or order of the vocabulary."
]
},
{
Expand Down Expand Up @@ -155,7 +155,7 @@
},
"source": [
"### Load the dataset\n",
"The tutorial uses the [Large Movie Review Dataset](http://ai.stanford.edu/~amaas/data/sentiment/). You will train a sentiment classifier model on this dataset and in the process learn embeddings from scratch. Refer to [Loading text tutorial](https://www.tensorflow.org/tutorials/load_data/text) to learn more. \n",
"The tutorial uses the [Large Movie Review Dataset](http://ai.stanford.edu/~amaas/data/sentiment/). You will train a sentiment classifier model on this dataset and in the process learn embeddings from scratch. Refer to the [Loading text tutorial](https://www.tensorflow.org/tutorials/load_data/text) to learn more. \n",
"\n",
"Download the dataset using Keras file utility and review the directories."
]
Expand Down Expand Up @@ -184,7 +184,7 @@
"id": "eY6yROZNKvbd"
},
"source": [
"The `train/` directory has `pos` and `neg` folders with movie reviews labelled as positive and negative respectively. You will use reviews from `pos` and `neg` folders to train a binary classification model."
"The `train/` directory has `pos` and `neg` folders with movie reviews labeled as positive and negative respectively. You will use reviews from `pos` and `neg` folders to train a binary classification model."
]
},
{
Expand Down Expand Up @@ -715,7 +715,7 @@
"source": [
"You have successfully updated the model to accept a new vocabulary. The embedding layer is updated to map old vocabulary words to old embeddings and initialize embeddings for new vocabulary words to be learnt. The learned weights of the rest of the model will remain the same. The model is warm-started to continue to train from where it left off previously.\n",
"\n",
"You can now verify that the remapping worked. Get index of the vocabulary word \"the\" that is present both in base and new vocabulary and compare the embedding values. They should be equal."
"You can now verify that the remapping worked. Get the index of the vocabulary word \"the\" that is present both in base and new vocabulary and compare the embedding values. They should be equal."
]
},
{
Expand Down Expand Up @@ -745,7 +745,7 @@
"source": [
"## Continue with warm-started training\n",
"\n",
"Notice how the training is warm-started. The accuracy of first epoch is around 85%. Close to the accuracy where the previous traning ended."
"Notice how the training is warm-started. The accuracy of first epoch is around 85%. This is close to the accuracy where the previous training ended."
]
},
{
Expand Down

0 comments on commit f3ba9bd

Please sign in to comment.