Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LNL][Cogagent] RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0 #12646

Open
juan-OY opened this issue Jan 3, 2025 · 3 comments
Assignees

Comments

@juan-OY
Copy link

juan-OY commented Jan 3, 2025

Model:https://huggingface.co/THUDM/cogagent-9b-20241220
CogAgent-9B-20241220 model is based on GLM-4V-9B, but i fail to run this CogAgent-9B-20241220

Setup guide follows: https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/Multimodal/glm-4v
env:
ipex-llm 2.2.0b20250102
transformers tried both 4.42.4 & 4.47.1

Failure as below:
Traceback (most recent call last):
File "D:\cogagent\generate.py", line 75, in
inputs = tokenizer.apply_chat_template([{"role": "user", "image": image, "content": query}],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\test.cache\huggingface\modules\transformers_modules\cogagent-9b-20241220\tokenization_chatglm.py", line 232, in apply_chat_template
result = handle_single_conversation(conversation)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\test.cache\huggingface\modules\transformers_modules\cogagent-9b-20241220\tokenization_chatglm.py", line 200, in handle_single_conversation
input_image = transform(item["image"])
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\test\miniforge3\envs\cogagent\Lib\site-packages\torchvision\transforms\transforms.py", line 95, in call
img = t(img)
^^^^^^
File "C:\Users\test\miniforge3\envs\cogagent\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self.call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\test\miniforge3\envs\cogagent\Lib\site-packages\torch\nn\modules\module.py", line 1541, in call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\test\miniforge3\envs\cogagent\Lib\site-packages\torchvision\transforms\transforms.py", line 277, in forward
return F.normalize(tensor, self.mean, self.std, self.inplace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\test\miniforge3\envs\cogagent\Lib\site-packages\torchvision\transforms\functional.py", line 350, in normalize
return F_t.normalize(tensor, mean=mean, std=std, inplace=inplace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\test\miniforge3\envs\cogagent\Lib\site-packages\torchvision\transforms_functional_tensor.py", line 926, in normalize
return tensor.sub
(mean).div
(std)
^^^^^^^^^^^^^^^^^
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

@qiuxin2012
Copy link
Contributor

The cogagent's prompt concatenation has strict limits, our glm-4v-9B's example can't meet their requirement. Have you changed generate.py followed by cogagent's requirements?
You can also reference their example https://github.com/THUDM/CogAgent/blob/main/inference/cli_demo.py

@juan-OY
Copy link
Author

juan-OY commented Jan 6, 2025

It is not with the format, it also fails with running https://github.com/THUDM/CogAgent/blob/main/inference/cli_demo.py or web_demo.py, it fails with the vision part.

@juan-OY
Copy link
Author

juan-OY commented Jan 6, 2025

It reports error as (web_demo.py):

File "C:\Users\test.cache\huggingface\modules\transformers_modules\cogagent-9b-20241220\visual.py", line 193, in forward
x = x.view(b, grid_size, grid_size, h).permute(0, 3, 1, 2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[1, 80, 80, 1792]' is invalid for input of size 11470592

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants