DOC Extending the vocab and storing embeddings (#1335)

Resolves #1300 Sourab added the feature to store the embedding layers alongside the adapter in #1147. This PR adds an entry to the documentation to explain the new feature. --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
huggingface · Jan 12, 2024 · f9b673e · f9b673e
1 parent dc28a61
commit f9b673e
Showing 1 changed file with 24 additions and 0 deletions.
diff --git a/docs/source/developer_guides/troubleshooting.md b/docs/source/developer_guides/troubleshooting.md
@@ -113,3 +113,27 @@ You should probably TRAIN this model on a down-stream task to be able to use it
 ```
 
 The mentioned layers should be added to `modules_to_save` in the config to avoid the described problem.
+
+### Extending the vocabulary
+
+For many language fine-tuning tasks, extending the model's vocabulary is necessary since new tokens are being introduced. This requires extending the embedding layer to account for the new tokens and also storing the embedding layer in addition to the adapter weights when saving the adapter.
+
+Save the embedding layer by adding it to the `target_modules` of the config. The embedding layer name must follow the standard naming scheme from Transformers. For example, the Mistral config could look like this:
+
+```python
+config = LoraConfig(..., target_modules=["embed_tokens", "lm_head", "q_proj", "v_proj"])
+```
+
+Once added to `target_modules`, PEFT automatically stores the embedding layer when saving the adapter if the model has the [`~transformers.PreTrainedModel.get_input_embeddings`] and [`~transformers.PreTrainedModel.get_output_embeddings`]. This is generally the case for Transformers models.
+
+If the model's embedding layer doesn't follow the Transformer's naming scheme, you can still save it by manually passing `save_embedding_layers=True` when saving the adapter:
+
+```python
+model = get_peft_model(...)
+# train the model
+model.save_adapter("my_adapter", save_embedding_layers=True)
+```
+
+For inference, load the base model first and resize it the same way you did before you trained the model. After you've resized the base model, you can load the PEFT checkpoint.
+
+For a complete example, please check out [this notebook](https://github.com/huggingface/peft/blob/main/examples/causal_language_modeling/peft_lora_clm_with_additional_tokens.ipynb).