Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Low-level-API] Add docs about LLAPI #836

Merged
merged 2 commits into from
Aug 18, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,42 @@ any GPU memory savings. Please refer issue [[FSDP] FSDP with CPU offload consume

2. When using ZeRO3 with zero3_init_flag=True, if you find the gpu memory increase with training steps. we might need to update deepspeed after [deepspeed commit 42858a9891422abc](https://github.com/microsoft/DeepSpeed/commit/42858a9891422abcecaa12c1bd432d28d33eb0d4) . The related issue is [[BUG] Peft Training with Zero.Init() and Zero3 will increase GPU memory every forward step ](https://github.com/microsoft/DeepSpeed/issues/3002)

## 🤗 PEFT as a utility library

Inject trainable adapters on any `torch` model using `inject_adapter_in_model` method:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure exactly about the wording, but I think it's worth highlighting here that calling this function will only inject the adapters but make no further changes to the model. Otherwise, users may be confused why they should use this and not get_peft_model.


```python
import torch
from peft import inject_adapter_in_model, LoraConfig

class DummyModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.embedding = torch.nn.Embedding(10, 10)
self.linear = torch.nn.Linear(10, 10)
self.lm_head = torch.nn.Linear(10, 10)

def forward(self, input_ids):
x = self.embedding(input_ids)
x = self.linear(x)
x = self.lm_head(x)
return x

lora_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
target_modules=["linear"],
)

model = DummyModel()
model = inject_adapter_in_model(lora_config, model, "default")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder now why we did not have adapter_name="default" as a default argument? I think it would help here. If we don't want it, at least I would pass it as keyword argument in this example, not positional, to make it clear what the meaning of "default" is.

If we change the function to make the argument a default argument (ugh, so confusing, default vs "default"), the docs below also need to change a little bit ("that takes 3 arguments ...").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, will address this change


dummy_inputs = torch.LongTensor([[0, 1, 2, 3, 4, 5, 6, 7]])
dummy_outputs = model(dummy_inputs)
```

## Backlog:
- [x] Add tests
- [x] Multi Adapter training and inference support
Expand Down
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@
sections:
- local: developer_guides/custom_models
title: Working with custom models
- local: developer_guides/low_level_api
title: PEFT low level API

- title: 🤗 Accelerate integrations
sections:
Expand Down
103 changes: 103 additions & 0 deletions docs/source/developer_guides/low_level_api.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# PEFT as a utility library

Let's cover in this section how you can leverage PEFT's low level API to inject trainable adapters into any `torch` module.
The development of this API has been motivated by the need for super users to not rely on modling classes that are exposed in PEFT library and still be able to use adapter methods such as LoRA, IA3 and AdaLoRA.

## Supported tuner types

Currently the supported adapter types are the 'injectable' adapters, meaning adapters where an inplace modification of the model is sufficient to correctly perform the fine tuning. As such, only [LoRA](./conceptual_guides/lora), AdaLoRA and [IA3](./conceptual_guides/ia3) are currently supported in this API.

## `inject_adapter_in_model` method

To perform the adapter injection, simply use `inject_adapter_in_model` method that takes 3 arguments, the PEFT config, the model itself and the adapter name.

Below is a basic example usage of how to inject LoRA adapters into the submodule `linear` of the module `DummyModel`.
```python
import torch
from peft import inject_adapter_in_model, LoraConfig


class DummyModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.embedding = torch.nn.Embedding(10, 10)
self.linear = torch.nn.Linear(10, 10)
self.lm_head = torch.nn.Linear(10, 10)

def forward(self, input_ids):
x = self.embedding(input_ids)
x = self.linear(x)
x = self.lm_head(x)
return x


lora_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
target_modules=["linear"],
)

model = DummyModel()
model = inject_adapter_in_model(lora_config, model, "default")

dummy_inputs = torch.LongTensor([[0, 1, 2, 3, 4, 5, 6, 7]])
dummy_outputs = model(dummy_inputs)
```

If you print the model, you will notice that the adapters have been correctly injected into the model

```bash
DummyModel(
(embedding): Embedding(10, 10)
(linear): Linear(
in_features=10, out_features=10, bias=True
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=10, out_features=64, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=64, out_features=10, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(lm_head): Linear(in_features=10, out_features=10, bias=True)
)
```
Note that it should be up to users to properly take care of saving the adapters (in case they want to save adapters only), as `model.state_dict()` will return the full state dict of the model.
In case you want to extract the adapters state dict you can use the `get_peft_model_state_dict` method:

```python
from peft import get_peft_model_state_dict

peft_state_dict = get_peft_model_state_dict(model)
print(peft_state_dict)
```

## Pros and cons

When to use this API and when to not use it? Let's discuss in this section the pros and cons

Pros:
- The model gets modified in-place, meaning the model will preserve all its original attributes and methods
- Works for any torch module, and any modality (vision, text, multi-modal)

Cons:
- You need to manually writing saving and loading utility methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- You need to manually writing saving and loading utility methods
- You need to manually write saving and loading utility methods

I think this could be confusing. I guess what you mean is stuff like from_pretrained etc. But people can still use the normal torch.save and torch.load that they already know. Saying they have to "manually" write the methods probably sounds worse than it actually is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm correct, let me rephrase this a bit then

- You cannot use any of the utility method provided by `PeftModel` such as disabling adapters, merging adapters, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a link to this section: https://huggingface.co/docs/peft/conceptual_guides/lora#utils-for-lora

Also, I took a look at some of the methods that are currently used for merging, unloading etc. and I think that with only a few changes, we can make them standalone functions (like inject_adapter_in_model) that don't require a PeftModel / LoraModel etc. At least for LoRA that should work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, we can do that in a follow up PR!

65 changes: 65 additions & 0 deletions tests/test_low_level_api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#!/usr/bin/env python3

# coding=utf-8
# Copyright 2023-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import unittest

import torch

from peft import LoraConfig, get_peft_model_state_dict, inject_adapter_in_model


class DummyModel(torch.nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that you added tests.

I think the tests could also be added to test_custom_models.py. The disadvantage of that change would be that as is, the tests are very clear and straightforward. The advantage would be that the tests in test_custom_models.py can be easily parametrized with different custom modules and configs, so the test coverage is better.

I would be okay if you want to keep it as is, it's just a suggestion.

Copy link
Contributor Author

@younesbelkada younesbelkada Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see ok thanks for explaining! I would say maybe let's keep it as it is since the test is aimed to only test the tiny snippet of the README and the docs so I want to keep it very simple and minimal for now

def __init__(self):
super().__init__()
self.embedding = torch.nn.Embedding(10, 10)
self.linear = torch.nn.Linear(10, 10)
self.lm_head = torch.nn.Linear(10, 10)

def forward(self, input_ids):
x = self.embedding(input_ids)
x = self.linear(x)
x = self.lm_head(x)
return x


class TestPeft(unittest.TestCase):
def setUp(self):
self.model = DummyModel()

lora_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
target_modules=["linear"],
)

self.model = inject_adapter_in_model(lora_config, self.model, "default")

def test_inject_adapter_in_model(self):
dummy_inputs = torch.LongTensor([[0, 1, 2, 3, 4, 5, 6, 7]])
_ = self.model(dummy_inputs)

for name, module in self.model.named_modules():
if name == "linear":
self.assertTrue(hasattr(module, "lora_A"))
self.assertTrue(hasattr(module, "lora_B"))

def test_get_peft_model_state_dict(self):
peft_state_dict = get_peft_model_state_dict(self.model)

for key in peft_state_dict.keys():
self.assertTrue("lora" in key)