OOM with Phi-3-mini (3.8B) on 83.5GB RAM due to LoftQ

### System Info

System: Linux-6.1.58+-x86_64-with-glibc2.35 / Google Colab
peft: 0.10.0
transformers: 4.40.1
accelerate: 0.30.0
Python 3.10.12
RAM: 83.5 GB
GPU: A100 40GB
CPU: Intel(R) Xeon(R) CPU @ 2.20GHz



### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder
- [X] My own task or dataset (give details below)

### Reproduction

```python

from peft import LoraConfig, LoftQConfig, get_peft_model, prepare_model_for_kbit_training

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,

checkpoint_path = "microsoft/Phi-3-mini-4k-instruct"
# checkpoint_path = "microsoft/Phi-3-mini-128k-instruct"
model_kwargs = dict(
    use_cache=False,
    trust_remote_code=True,
    attn_implementation="flash_attention_2", 
    torch_dtype=th.bfloat16,
    device_map="auto",
)
peft_config = LoraConfig(
    r = 8,
    lora_alpha = 32,
    lora_dropout = 0.05,
    bias = "none",
    task_type =  "CAUSAL_LM",
    target_modules = "all-linear",
    modules_to_save = None,
    loftq_config = LoftQConfig(loftq_bits=8),
    init_lora_weights="loftq",
    use_rslora = True, 
)

model = AutoModelForCausalLM.from_pretrained(checkpoint_path, **model_kwargs)
model = prepare_model_for_kbit_training(model).to("cpu")
th.cuda.empty_cache()
gc.collect()
```
Until here everything is fine. The model size is about 7GB and it's loaded onto RAM.

However when I try to get the PEFT model :

```py
model = get_peft_model(model, peft_config)
```
This leads to a crash of the entire system, despite having plenty of space.

Note that when I remove LoftQ, the problem does not occur:

```py
peft_config = LoraConfig(
    r = 8,
    lora_alpha = 32,
    lora_dropout = 0.05,
    bias = "none",
    task_type =  "CAUSAL_LM",
    target_modules = "all-linear",
    modules_to_save = None,
    use_rslora = True, 
)
```


### Expected behavior

The model should comfortably fit in the RAM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OOM with Phi-3-mini (3.8B) on 83.5GB RAM due to LoftQ #1708

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OOM with Phi-3-mini (3.8B) on 83.5GB RAM due to LoftQ #1708

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions