Google Gemma running error with half dtype #157

hnyls2002 · 2024-03-06T02:09:44Z

Using flashinfer in sglang with google/gemma-7b-it

  File "/home/ubuntu/sglang-venv/lib/python3.11/site-packages/flashinfer/prefill.py", line 462, in forward
    return self._wrapper.forward(
           ^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: BatchPrefillWithPagedKVCache failed to dispatch with dtype Half

I don't know if this is caused by Gemma's bfloat16 dtype or my inappropriate usage.

The text was updated successfully, but these errors were encountered:

yzh119 · 2024-03-06T02:24:43Z

Sorry the error message was confusing.

It's because flashinfer 0.0.2 do not support head dim 256 (enabled in #132 ), I'll trigger v0.0.3 build and release the new version tonight.

hnyls2002 · 2024-03-06T02:36:27Z

@yzh119 Thanks, looking forward to the new version!

The previous message for dispatch error is confusing #157 #181

hnyls2002 mentioned this issue Mar 6, 2024

Gemma Support sgl-project/sglang#256

Merged

hnyls2002 closed this as completed Mar 11, 2024

yzh119 mentioned this issue Mar 16, 2024

fix: fix python package dispatch error message #182

Merged

yzh119 added a commit that referenced this issue Mar 16, 2024

fix: fix python package dispatch error message (#182)

8eed01c

The previous message for dispatch error is confusing #157 #181

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Gemma running error with half dtype #157

Google Gemma running error with half dtype #157

hnyls2002 commented Mar 6, 2024 •

edited

Loading

yzh119 commented Mar 6, 2024

hnyls2002 commented Mar 6, 2024

Google Gemma running error with half dtype #157

Google Gemma running error with half dtype #157

Comments

hnyls2002 commented Mar 6, 2024 • edited Loading

yzh119 commented Mar 6, 2024

hnyls2002 commented Mar 6, 2024

hnyls2002 commented Mar 6, 2024 •

edited

Loading