Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

为何我在A800上运行DeepSeek-V2-Lite-Chat (SFT),竟然消耗60G的显存?! #74

Open
juhengzhe opened this issue Jul 19, 2024 · 3 comments

Comments

@juhengzhe
Copy link

权重文件一共32G左右。
为啥实际加载模型后,占用内存将近60多G呢。

@juhengzhe
Copy link
Author

模型加载时,通过指定数据类型为float16避免使用全精度,可以使内存降到40G以下。

@liangfang
Copy link

注意到这句话——
The model has a long context length (163840). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space.
Consider setting --max-model-len to a smaller value.

但是我也想就此请教一下long context length为啥消耗显存那么多?

@beep-bebop
Copy link

注意到这句话—— The model has a long context length (163840). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. 该模型的上下文长度很长(163840)。这可能会在初始内存分析阶段导致OOM错误,或者由于KV缓存空间较小而导致性能低下。 Consider setting --max-model-len to a smaller value. 考虑将--max-mode-len设置为较小的值。

但是我也想就此请教一下long context length为啥消耗显存那么多?

我猜是为long context做了显存的预分配,后续推理的时候显存不会变化太多。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants