Skip to content

Support for DeepSeekV2-Chat with only 16GB VRAM #55

Answered by Azure-Tang
sayap asked this question in Q&A
Discussion options

You must be logged in to vote

Sorry for inconvenience. It seems we cannot put whole layer to cpu. I have fixed this bug in #62.

And if you have only 16G VRAM, a good news is that we compressed deepseekv2's required VRAM from 21G to 11G. Please check our latest release.

Btw, to offload whole layer to CPU, you have to slightly modify your yaml:

- match:
    name: "^model\\.layers\\.([45][0-9])\\.(?!self_attn).*$"  # regular expression 
    class: torch.nn.Linear  # only match modules matching name and class simultaneously
  replace:
    class: ktransformers.operators.linear.KTransformersLinear  # optimized Kernel on quantized data types
    kwargs:
      generate_device: "cpu"
      prefill_device: "cpu"
      genera…

Replies: 3 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
2 replies
@sayap
Comment options

@sayap
Comment options

Answer selected by sayap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants