Skip to content

'centroids must be a CUDA tensor' error when running Qwen2.5-72B-Instruct 2-bit in RTX4090 #59

@yueqianh

Description

@yueqianh

I seem to be encountering the following issues when trying to run Qwen2.5-72B-Instruct 2-bit in RTX4090, with Windows 11 23H2, CUDA 12.6 and conda environment.

I can run llama3.1-70B-Instruct 2-bit just fine, like what the example in readme shows.

From the log file below, what I suspect to be the issue is this line:
Some parameters are on the meta device because they were offloaded to the cpu.
which eventually leads to
RuntimeError: centroids must be a CUDA tensor.

However, given Qwen2.5-72B-Ins 2 bit is similar in size to llama3.1-70B-Ins 2 bit, I am puzzled why such discrepancy occurs. May I know if the team has any insight about this?

Replacing linear layers...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1127/1127 [00:00<00:00, 5179.38it/s]
Fetching 16 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 15947.92it/s]
Some parameters are on the meta device because they were offloaded to the cpu.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py:623: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
Traceback (most recent call last):
  File "C:\Users\i9-4090\Documents\test\code\test\test_cuda.py", line 17, in <module>
    outputs = m.generate(tokenized_chat, max_new_tokens=128)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\transformers\generation\utils.py", line 2048, in generate
    result = self._sample(
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\transformers\generation\utils.py", line 3008, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\accelerate\hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 1167, in forward
    outputs = self.model(
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 976, in forward
    layer_outputs = decoder_layer(
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\accelerate\hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 702, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\accelerate\hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 580, in forward
    query_states = self.q_proj(hidden_states)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\accelerate\hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\vptq\layers\vqlinear.py", line 649, in forward
    qweight = self.fast_dequant()
  File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\vptq\layers\vqlinear.py", line 485, in fast_dequant
    output = ops.dequant(
RuntimeError: centroids must be a CUDA tensor
Exception raised from dequant at csrc/ops.cc:51 (most recent call first):
00007FFA8BDB462900007FFA8BDB4580 c10.dll!c10::Error::Error [<unknown file> @ <unknown line number>]
00007FFA8BDB416A00007FFA8BDB4110 c10.dll!c10::detail::torchCheckFail [<unknown file> @ <unknown line number>]
00007FF985F9168900007FF985F85D00 ops.cp310-win_amd64.pyd!c10::ivalue::Object::operator= [<unknown file> @ <unknown line number>]
00007FF985F9AA8400007FF985F921A0 ops.cp310-win_amd64.pyd!PyInit_ops [<unknown file> @ <unknown line number>]
00007FF985F9AAF400007FF985F921A0 ops.cp310-win_amd64.pyd!PyInit_ops [<unknown file> @ <unknown line number>]
00007FF985F8DD3B00007FF985F85D00 ops.cp310-win_amd64.pyd!c10::ivalue::Object::operator= [<unknown file> @ <unknown line number>]
00007FFA4ED782F600007FFA4ED77530 python310.dll!PyCFunction_GetFlags [<unknown file> @ <unknown line number>]
00007FFA4ED3554C00007FFA4ED35410 python310.dll!PyObject_MakeTpCall [<unknown file> @ <unknown line number>]
00007FFA4EE2E6F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE2AE9F00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4ED377D200007FFA4ED37700 python310.dll!PyMethod_Self [<unknown file> @ <unknown line number>]
00007FFA4EE261A900007FFA4EE26060 python310.dll!PyOS_URandomNonblock [<unknown file> @ <unknown line number>]
00007FFA4EE2E6F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE2AE9F00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4ED3766900007FFA4ED37270 python310.dll!PyCell_Set [<unknown file> @ <unknown line number>]
00007FFA4ED378F100007FFA4ED37700 python310.dll!PyMethod_Self [<unknown file> @ <unknown line number>]
00007FFA4EE2E8F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE29D1900007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4EC49C99 <unknown symbol address> python310.dll!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FFA4EC4A2DB <unknown symbol address> python310.dll!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FFA4EE2E8F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE29D1900007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4ED3766900007FFA4ED37270 python310.dll!PyCell_Set [<unknown file> @ <unknown line number>]
00007FFA4ED378F100007FFA4ED37700 python310.dll!PyMethod_Self [<unknown file> @ <unknown line number>]
00007FFA4EE2E8F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE29D1900007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4ED353B400007FFA4ED352E0 python310.dll!PyObject_FastCallDictTstate [<unknown file> @ <unknown line number>]
00007FFA4ED35AD200007FFA4ED35A30 python310.dll!PyObject_Call_Prepend [<unknown file> @ <unknown line number>]
00007FFA4ED9FC0C00007FFA4ED9C470 python310.dll!PyType_Ready [<unknown file> @ <unknown line number>]
00007FFA4ED3554C00007FFA4ED35410 python310.dll!PyObject_MakeTpCall [<unknown file> @ <unknown line number>]
00007FFA4EE2E6F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE2AE9F00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4ED377D200007FFA4ED37700 python310.dll!PyMethod_Self [<unknown file> @ <unknown line number>]
00007FFA4ED3566D00007FFA4ED355B0 python310.dll!PyVectorcall_Call [<unknown file> @ <unknown line number>]
00007FFA4EE2E8F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE29D1900007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4EC49C99 <unknown symbol address> python310.dll!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FFA4EC4A24B <unknown symbol address> python310.dll!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FFA4ED3566D00007FFA4ED355B0 python310.dll!PyVectorcall_Call [<unknown file> @ <unknown line number>]
00007FFA4EE2E8F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE29D1900007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4ED377D200007FFA4ED37700 python310.dll!PyMethod_Self [<unknown file> @ <unknown line number>]
00007FFA4ED3566D00007FFA4ED355B0 python310.dll!PyVectorcall_Call [<unknown file> @ <unknown line number>]
00007FFA4EE2E8F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE29D1900007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4ED3539100007FFA4ED352E0 python310.dll!PyObject_FastCallDictTstate [<unknown file> @ <unknown line number>]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions