Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update PEFT doc #8664

Merged
merged 2 commits into from
Mar 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions docs/source/nlp/nemo_megatron/peft/landing_page.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,18 @@ points, PEFT achieves comparable performance to full finetuning at a
fraction of the computational and storage costs.

NeMo supports four PEFT methods which can be used with various
transformer-based models.
transformer-based models. `Here <https://github.com/NVIDIA/NeMo/tree/main/scripts/nlp_language_modeling>`__
is a collection of conversion scripts that convert
popular models from HF format to nemo format.

==================== ===== ======== ========= ====== ==
\ GPT 3 Nemotron LLaMa 1/2 Falcon T5
==================== ===== ======== ========= ====== ==
LoRA ✅ ✅ ✅ ✅ ✅
P-Tuning ✅ ✅ ✅ ✅ ✅
Adapters (Canonical) ✅ ✅ ✅ ✅
IA3 ✅ ✅ ✅ ✅
==================== ===== ======== ========= ====== ==
==================== ===== ======== ========= ====== ========= ===== ==
\ GPT 3 Nemotron LLaMa 1/2 Falcon Starcoder Gemma T5
==================== ===== ======== ========= ====== ========= ===== ==
LoRA ✅ ✅ ✅ ✅ ✅ ✅ ✅
P-Tuning ✅ ✅ ✅ ✅ ✅ ✅ ✅
Adapters (Canonical) ✅ ✅ ✅ ✅ ✅ ✅
IA3 ✅ ✅ ✅ ✅ ✅ ✅
==================== ===== ======== ========= ====== ========= ===== ==

Learn more about PEFT in NeMo with the :ref:`peftquickstart` which provides an overview on how PEFT works
in NeMo. Read about the supported PEFT methods
Expand Down
24 changes: 16 additions & 8 deletions docs/source/nlp/nemo_megatron/peft/supported_methods.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,24 +17,29 @@ NeMo supports the following PFET tuning methods
each case, the output linear layer is initialized to 0 to ensure
that an untrained adapter does not affect the normal forward pass
of the transformer layer.
- In NeMo, you can customize the adapter bottleneck dimension,
adapter dropout amount, as well as the type and position of
normalization layer.

2. **LoRA**: `LoRA: Low-Rank Adaptation of Large Language
Models <http://arxiv.org/abs/2106.09685>`__

- LoRA makes fine-tuning efficient by representing weight updates
with two low rank decomposition matrices. The original model
weights remain frozen, while the low rank decomposition matrices
are updated to adapt to the new data , so the number of trainable
are updated to adapt to the new data, so the number of trainable
parameters is kept low. In contrast with adapters, the original
model weights and adapted weights can be combined during
inference, avoiding any architectural change or additional latency
in the model at inference time.
- The matrix decomposition operation can be applied to any linear
layer, but in practice, it is only applied to the K, Q, V
projection matrices (sometimes just applied to the Q,V layers).
Since NeMo's attention implementation fuses KQV into a single
projection, our LoRA implementation learns a single Low-Rank
projection for KQV in a combined fashion.
- In NeMo, you can customize the adapter bottleneck dimension and
the target modules to apply LoRA. LoRA can be applied to any linear
layer. In a transformer model, this includes 1) Q, K, V attention
projections, 2) attention output layer, and 3) either or both of
the two transformer MLP layers. For QKV, NeMo's attention
implementation fuses QKV into a single projection, so our LoRA
implementation learns a single Low-Rank projection for QKV
combined.

3. **IA3**: `Few-Shot Parameter-Efficient Fine-Tuning is Better and
Cheaper than In-Context Learning <http://arxiv.org/abs/2205.05638>`__
Expand All @@ -51,6 +56,7 @@ NeMo supports the following PFET tuning methods
learning rescaling vectors can also be merged with the base
weights, leading to no architectural change and no additional
latency at inference time.
- There is no hyperparameter to tune for the IA3 adapter.

4. **P-Tuning**: `GPT Understands,
Too <https://arxiv.org/abs/2103.10385>`__
Expand All @@ -63,9 +69,11 @@ NeMo supports the following PFET tuning methods
vocabulary. They are simply 1D vectors that match the
dimensionality of real tokens which make up the model's
vocabulary.
- In p-tuning, an intermediate LSTM or MLP model is used to generate
- In p-tuning, an intermediate MLP model is used to generate
virtual token embeddings. We refer to this intermediate model as
our ``prompt_encoder``. The prompt encoder parameters are randomly
initialized at the start of p-tuning. All base model parameters
are frozen, and only the prompt encoder weights are updated at
each training step.
- In Nemo, you can customize the number of virtual tokens, as well
as the embedding and MLP bottleneck dimensions.
Loading