Getting Dora Model Is Very Slow

### System Info

Package                  Version
------------------------ ---------------
accelerate               0.29.0.dev0
aiohttp                  3.9.3
aiosignal                1.3.1
annotated-types          0.6.0
appdirs                  1.4.4
async-timeout            4.0.3
attrs                    23.2.0
bitsandbytes             0.43.0
certifi                  2024.2.2
charset-normalizer       3.3.2
click                    8.1.7
datasets                 2.18.0
deepspeed                0.14.0+ce78a632
dill                     0.3.8
docker-pycreds           0.4.0
docstring_parser         0.16
einops                   0.7.0
exceptiongroup           1.2.0
filelock                 3.13.3
flash-attn               2.5.6
frozenlist               1.4.1
fsspec                   2024.2.0
gitdb                    4.0.11
GitPython                3.1.42
hjson                    3.1.0
huggingface-hub          0.22.1
idna                     3.6
iniconfig                2.0.0
Jinja2                   3.1.3
markdown-it-py           3.0.0
MarkupSafe               2.1.5
mdurl                    0.1.2
mpmath                   1.3.0
multidict                6.0.5
multiprocess             0.70.16
networkx                 3.1
ninja                    1.11.1.1
numpy                    1.24.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.19.3
nvidia-nvjitlink-cu12    12.4.99
nvidia-nvtx-cu12         12.1.105
packaging                24.0
pandas                   2.0.3
peft                     0.10.1.dev0
pillow                   10.2.0
pip                      24.0
pluggy                   1.4.0
protobuf                 3.20.1
psutil                   5.9.8
py-cpuinfo               9.0.0
pyarrow                  15.0.2
pyarrow-hotfix           0.6
pydantic                 2.6.4
pydantic_core            2.16.3
Pygments                 2.17.2
pynvml                   11.5.0
pytest                   8.1.1
python-dateutil          2.9.0.post0
pytz                     2024.1
PyYAML                   6.0.1
regex                    2023.12.25
requests                 2.31.0
rich                     13.7.1
safetensors              0.4.2
scipy                    1.10.1
sentencepiece            0.2.0
sentry-sdk               1.43.0
setproctitle             1.3.3
setuptools               69.2.0
shtab                    1.7.1
six                      1.16.0
smmap                    5.0.1
sympy                    1.12
text-generation          0.7.0
tokenizers               0.15.2
tomli                    2.0.1
torch                    2.2.1
torchaudio               2.2.1
torchvision              0.17.1
tqdm                     4.66.2
transformers             4.40.0.dev0
triton                   2.2.0
trl                      0.8.1
typing_extensions        4.10.0
tyro                     0.7.3
tzdata                   2024.1
urllib3                  2.2.1
wandb                    0.16.5
wheel                    0.43.0
xxhash                   3.4.1
yarl                     1.9.4

python 3.11

I have tested this on both a dual A100 and dual 3090 system. Using the same docker image. 

### Who can help?

@pacman100 @younesbelkada @sayakpaul 

When calling the ```get_peft_model``` method with config that has ```use_dora=True``` the time to get a model is VERY long(several minutes).  Meanwhile, if I just use a regular Lora model, I get the model almost immediately.  I also do not have this issue when using a QDora model oddly enough.

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder
- [X] My own task or dataset (give details below)

### Reproduction

```python
model_name = "mistralai/Mistral-7B-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_name, token=access_token,use_flash_attention_2=True)
peft_config = LoraConfig(
            task_type=TaskType.CAUSAL_LM, inference_mode=False, r=args.lora_rank, lora_alpha=args.lora_alpha, lora_dropout=args.lora_dropout,target_modules=target_modules,modules_to_save=modules_to_save,use_dora=args.dora
        )
model = get_peft_model(model, peft_config)
```

I removed some stuff to keep it simple.  If you want to see a more complete example on how I am running this, please see the code [here](https://github.com/mallorbc/Finetune_LLMs)



### Expected behavior

I would expect Dora to load as quickly as Lora, or at least not several orders of magnitude slower.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Getting Dora Model Is Very Slow #1593

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Getting Dora Model Is Very Slow #1593

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions