为何我在A800上运行DeepSeek-V2-Lite-Chat (SFT)，竟然消耗60G的显存？！ #74

juhengzhe · 2024-07-19T18:50:42Z

权重文件一共32G左右。
为啥实际加载模型后，占用内存将近60多G呢。

juhengzhe · 2024-07-25T05:03:33Z

模型加载时，通过指定数据类型为float16避免使用全精度，可以使内存降到40G以下。

liangfang · 2024-07-28T13:57:45Z

注意到这句话——
The model has a long context length (163840). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space.
Consider setting --max-model-len to a smaller value.

但是我也想就此请教一下long context length为啥消耗显存那么多？

beep-bebop · 2024-08-06T02:14:47Z

注意到这句话—— The model has a long context length (163840). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. 该模型的上下文长度很长（163840）。这可能会在初始内存分析阶段导致OOM错误，或者由于KV缓存空间较小而导致性能低下。 Consider setting --max-model-len to a smaller value. 考虑将--max-mode-len设置为较小的值。

但是我也想就此请教一下long context length为啥消耗显存那么多？

我猜是为long context做了显存的预分配，后续推理的时候显存不会变化太多。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

为何我在A800上运行DeepSeek-V2-Lite-Chat (SFT)，竟然消耗60G的显存？！ #74

为何我在A800上运行DeepSeek-V2-Lite-Chat (SFT)，竟然消耗60G的显存？！ #74

juhengzhe commented Jul 19, 2024

juhengzhe commented Jul 25, 2024

liangfang commented Jul 28, 2024

beep-bebop commented Aug 6, 2024

为何我在A800上运行DeepSeek-V2-Lite-Chat (SFT)，竟然消耗60G的显存？！ #74

为何我在A800上运行DeepSeek-V2-Lite-Chat (SFT)，竟然消耗60G的显存？！ #74

Comments

juhengzhe commented Jul 19, 2024

juhengzhe commented Jul 25, 2024

liangfang commented Jul 28, 2024

beep-bebop commented Aug 6, 2024