Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to use ipex-llm to run both the original and quantized versions of the Baichuan2 model on an Intel GPU #11959

Open
tao-ov opened this issue Aug 29, 2024 · 1 comment

Comments

@tao-ov
Copy link

tao-ov commented Aug 29, 2024

model_path = "/home/test/models/LLM/baichuan2-7b/pytorch"

Load and optimize the INT4 model with IPEX

low_bit = "sym_int4"
model_int4 = BigdlForCausalLM.from_pretrained(model_path, load_in_low_bit=low_bit, optimize_model=True,
trust_remote_code=True, use_cache=True).eval()

Load the FP32 model and tokenizer

model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

Move models and data to XPU

device = 'xpu'
model = model.to(device)
model_int4 = model_int4.to(device)

Ensure data is also moved to XPU

def load_data_and_to_device(dataset, isplit, dataset_field, device):
prompts = load_prompts(dataset, isplit, dataset_field)
if prompts is not None:
prompts["questions"] = [torch.tensor(item).to(device) for item in prompts["questions"]]
return prompts

dataset = "squad"
isplit = "validation[:32]"
dataset_field = "question"

prompts = load_data_and_to_device(dataset, isplit, dataset_field, device)

Create the evaluator with the models and tokenizer

evaluator = whowhatbench.Evaluator(base_model=model, tokenizer=tokenizer, test_data=prompts)

Score the model and get metrics

all_metrics_per_question, all_metrics = evaluator.score(model_int4)

print(all_metrics_per_question)
print(all_metrics)

@tao-ov
Copy link
Author

tao-ov commented Aug 29, 2024

return self.fget.get(instance, owner)()
2024-08-29 15:39:14,061 - INFO - Converting the current model to sym_int4 format......
Traceback (most recent call last):
File "/home/test/ipexllm_whowhat/ipex-llm/python/llm/example/GPU/HuggingFace/LLM/baichuan2/who_what_benchmark/examples/ipex-llm.eval.py", line 42, in
model = model.to(device)
^^^^^^^^^^^^^^^^
File "/home/test/miniforge3/envs/llmtest/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2597, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/test/miniforge3/envs/llmtest/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/home/test/miniforge3/envs/llmtest/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/home/test/miniforge3/envs/llmtest/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/home/test/miniforge3/envs/llmtest/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "/home/test/miniforge3/envs/llmtest/lib/python3.11/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/home/test/miniforge3/envs/llmtest/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1158, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant