Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don`t work on CPU "Unable to get JIT kernel for brgemm" #241

Open
andretisch opened this issue Nov 25, 2024 · 1 comment
Open

Don`t work on CPU "Unable to get JIT kernel for brgemm" #241

andretisch opened this issue Nov 25, 2024 · 1 comment

Comments

@andretisch
Copy link

Hi, guys. I am very impressed with your project. Thank you very much for your work. I encountered a problem. I can't run your model on the CPU. Here's what I'm doing in Colab:

!pip install -q autoawq[cpu]
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_name = "Qwen/Qwen2.5-3B-Instruct-AWQ"
model = AutoAWQForCausalLM.from_quantized(
    model_name,
    use_ipex=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt")
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

And I get the following error:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Fetching 10 files: 100% 10/10 [00:00<00:00, 52958.38it/s]
Replacing layers...: 100% 36/36 [00:21<00:00,  1.68it/s]
Fusing layers...: 100% 36/36 [00:00<00:00, 56.02it/s]
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Unable to get JIT kernel for brgemm. Params: M=32, N=39, K=128, str_a=1, str_b=1, brgemm_type=1, beta=0, a_trans=0, unroll_hint=1, lda=2048, ldb=39, ldc=39, config=0, b_vnni=0

What could this be?

@XcloudFance
Copy link

Hi, I also encounter the same situation here. Is it possible to run AWQ on CPU? Or it is very stricted to run on GPU only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants