-
Notifications
You must be signed in to change notification settings - Fork 46
Labels
bugSomething isn't workingSomething isn't workingquestionFurther information is requestedFurther information is requested
Description
I seem to be encountering the following issues when trying to run Qwen2.5-72B-Instruct 2-bit in RTX4090, with Windows 11 23H2, CUDA 12.6 and conda environment.
I can run llama3.1-70B-Instruct 2-bit just fine, like what the example in readme shows.
From the log file below, what I suspect to be the issue is this line:
Some parameters are on the meta device because they were offloaded to the cpu.
which eventually leads to
RuntimeError: centroids must be a CUDA tensor
.
However, given Qwen2.5-72B-Ins 2 bit is similar in size to llama3.1-70B-Ins 2 bit, I am puzzled why such discrepancy occurs. May I know if the team has any insight about this?
Replacing linear layers...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1127/1127 [00:00<00:00, 5179.38it/s]
Fetching 16 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 15947.92it/s]
Some parameters are on the meta device because they were offloaded to the cpu.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py:623: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
Traceback (most recent call last):
File "C:\Users\i9-4090\Documents\test\code\test\test_cuda.py", line 17, in <module>
outputs = m.generate(tokenized_chat, max_new_tokens=128)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\transformers\generation\utils.py", line 2048, in generate
result = self._sample(
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\transformers\generation\utils.py", line 3008, in _sample
outputs = self(**model_inputs, return_dict=True)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\accelerate\hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 1167, in forward
outputs = self.model(
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 976, in forward
layer_outputs = decoder_layer(
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\accelerate\hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 702, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\accelerate\hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 580, in forward
query_states = self.q_proj(hidden_states)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\accelerate\hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\vptq\layers\vqlinear.py", line 649, in forward
qweight = self.fast_dequant()
File "C:\Users\i9-4090\miniconda3\envs\test\lib\site-packages\vptq\layers\vqlinear.py", line 485, in fast_dequant
output = ops.dequant(
RuntimeError: centroids must be a CUDA tensor
Exception raised from dequant at csrc/ops.cc:51 (most recent call first):
00007FFA8BDB462900007FFA8BDB4580 c10.dll!c10::Error::Error [<unknown file> @ <unknown line number>]
00007FFA8BDB416A00007FFA8BDB4110 c10.dll!c10::detail::torchCheckFail [<unknown file> @ <unknown line number>]
00007FF985F9168900007FF985F85D00 ops.cp310-win_amd64.pyd!c10::ivalue::Object::operator= [<unknown file> @ <unknown line number>]
00007FF985F9AA8400007FF985F921A0 ops.cp310-win_amd64.pyd!PyInit_ops [<unknown file> @ <unknown line number>]
00007FF985F9AAF400007FF985F921A0 ops.cp310-win_amd64.pyd!PyInit_ops [<unknown file> @ <unknown line number>]
00007FF985F8DD3B00007FF985F85D00 ops.cp310-win_amd64.pyd!c10::ivalue::Object::operator= [<unknown file> @ <unknown line number>]
00007FFA4ED782F600007FFA4ED77530 python310.dll!PyCFunction_GetFlags [<unknown file> @ <unknown line number>]
00007FFA4ED3554C00007FFA4ED35410 python310.dll!PyObject_MakeTpCall [<unknown file> @ <unknown line number>]
00007FFA4EE2E6F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE2AE9F00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4ED377D200007FFA4ED37700 python310.dll!PyMethod_Self [<unknown file> @ <unknown line number>]
00007FFA4EE261A900007FFA4EE26060 python310.dll!PyOS_URandomNonblock [<unknown file> @ <unknown line number>]
00007FFA4EE2E6F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE2AE9F00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4ED3766900007FFA4ED37270 python310.dll!PyCell_Set [<unknown file> @ <unknown line number>]
00007FFA4ED378F100007FFA4ED37700 python310.dll!PyMethod_Self [<unknown file> @ <unknown line number>]
00007FFA4EE2E8F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE29D1900007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4EC49C99 <unknown symbol address> python310.dll!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FFA4EC4A2DB <unknown symbol address> python310.dll!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FFA4EE2E8F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE29D1900007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4ED3766900007FFA4ED37270 python310.dll!PyCell_Set [<unknown file> @ <unknown line number>]
00007FFA4ED378F100007FFA4ED37700 python310.dll!PyMethod_Self [<unknown file> @ <unknown line number>]
00007FFA4EE2E8F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE29D1900007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4ED353B400007FFA4ED352E0 python310.dll!PyObject_FastCallDictTstate [<unknown file> @ <unknown line number>]
00007FFA4ED35AD200007FFA4ED35A30 python310.dll!PyObject_Call_Prepend [<unknown file> @ <unknown line number>]
00007FFA4ED9FC0C00007FFA4ED9C470 python310.dll!PyType_Ready [<unknown file> @ <unknown line number>]
00007FFA4ED3554C00007FFA4ED35410 python310.dll!PyObject_MakeTpCall [<unknown file> @ <unknown line number>]
00007FFA4EE2E6F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE2AE9F00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4ED377D200007FFA4ED37700 python310.dll!PyMethod_Self [<unknown file> @ <unknown line number>]
00007FFA4ED3566D00007FFA4ED355B0 python310.dll!PyVectorcall_Call [<unknown file> @ <unknown line number>]
00007FFA4EE2E8F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE29D1900007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4EC49C99 <unknown symbol address> python310.dll!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FFA4EC4A24B <unknown symbol address> python310.dll!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FFA4ED3566D00007FFA4ED355B0 python310.dll!PyVectorcall_Call [<unknown file> @ <unknown line number>]
00007FFA4EE2E8F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE29D1900007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4ED377D200007FFA4ED37700 python310.dll!PyMethod_Self [<unknown file> @ <unknown line number>]
00007FFA4ED3566D00007FFA4ED355B0 python310.dll!PyVectorcall_Call [<unknown file> @ <unknown line number>]
00007FFA4EE2E8F200007FFA4EE2E300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>]
00007FFA4EE29D1900007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4EE2CE7B00007FFA4EE27A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFA4ED3585E00007FFA4ED35820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FFA4ED3539100007FFA4ED352E0 python310.dll!PyObject_FastCallDictTstate [<unknown file> @ <unknown line number>]
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingquestionFurther information is requestedFurther information is requested