optimize get_module_leaves speed #2756

BBuf · 2024-05-09T07:13:30Z

Background

When I try to inference deepseek-v2 with transformers:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "deepseek-ai/DeepSeek-V2"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
# `max_memory` should be set based on your devices
max_memory = {i: "75GB" for i in range(8)}
# `device_map` cannot be set to `auto`
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="sequential", torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation="eager")
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

I found that program with stuck...

Solve

In fact, the stuck appearing here .

For DeepSeek V2, the length of module_sizes is 68185, and the code here has a complexity of O(N^2) (where N = module_sizes), which requires a very long time to execute, giving the illusion of being stuck. This PR optimizes the code to have a complexity of O(N), allowing it to quickly reach the stage of loading the large model. The inference results on an 8xA800 machine are also normal after this optimization.

HuggingFaceDocBuilderDev · 2024-05-09T07:53:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

Thanks for this great PR @BBuf ! Just a small nit.

src/accelerate/utils/modeling.py

muellerzr

Thanks for making this much more efficient and fixing 🤗

* optimize get_module_leaves * fix format * Update modeling.py

optimize get_module_leaves

1e2c9ad

muellerzr requested a review from SunMarc May 9, 2024 12:22

fix format

118f513

SunMarc approved these changes May 13, 2024

View reviewed changes

src/accelerate/utils/modeling.py Outdated Show resolved Hide resolved

Update modeling.py

d62b12e

muellerzr approved these changes May 13, 2024

View reviewed changes

muellerzr merged commit 6cf1cc0 into huggingface:main May 13, 2024
23 checks passed

SunMarc mentioned this pull request May 14, 2024

Fix small edge case in get_module_leaves #2774

Merged

yhna940 pushed a commit to yhna940/accelerate that referenced this pull request May 16, 2024

optimize get_module_leaves speed (huggingface#2756)

ca5ea47

* optimize get_module_leaves * fix format * Update modeling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize get_module_leaves speed #2756

optimize get_module_leaves speed #2756

BBuf commented May 9, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented May 9, 2024

SunMarc left a comment

muellerzr left a comment

optimize get_module_leaves speed #2756

optimize get_module_leaves speed #2756

Conversation

BBuf commented May 9, 2024 • edited Loading

Background

Solve

HuggingFaceDocBuilderDev commented May 9, 2024

SunMarc left a comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

BBuf commented May 9, 2024 •

edited

Loading