how to use ipex-llm to run both the original and quantized versions of the Baichuan2 model on an Intel GPU #11959

tao-ov · 2024-08-29T07:35:40Z

model_path = "/home/test/models/LLM/baichuan2-7b/pytorch"

Load and optimize the INT4 model with IPEX

low_bit = "sym_int4"
model_int4 = BigdlForCausalLM.from_pretrained(model_path, load_in_low_bit=low_bit, optimize_model=True,
trust_remote_code=True, use_cache=True).eval()

Load the FP32 model and tokenizer

model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

Move models and data to XPU

device = 'xpu'
model = model.to(device)
model_int4 = model_int4.to(device)

Ensure data is also moved to XPU

def load_data_and_to_device(dataset, isplit, dataset_field, device):
prompts = load_prompts(dataset, isplit, dataset_field)
if prompts is not None:
prompts["questions"] = [torch.tensor(item).to(device) for item in prompts["questions"]]
return prompts

dataset = "squad"
isplit = "validation[:32]"
dataset_field = "question"

prompts = load_data_and_to_device(dataset, isplit, dataset_field, device)

Create the evaluator with the models and tokenizer

evaluator = whowhatbench.Evaluator(base_model=model, tokenizer=tokenizer, test_data=prompts)

Score the model and get metrics

all_metrics_per_question, all_metrics = evaluator.score(model_int4)

print(all_metrics_per_question)
print(all_metrics)

tao-ov · 2024-08-29T07:40:46Z

return self.fget.get(instance, owner)()
2024-08-29 15:39:14,061 - INFO - Converting the current model to sym_int4 format......
Traceback (most recent call last):
File "/home/test/ipexllm_whowhat/ipex-llm/python/llm/example/GPU/HuggingFace/LLM/baichuan2/who_what_benchmark/examples/ipex-llm.eval.py", line 42, in
model = model.to(device)
^^^^^^^^^^^^^^^^
File "/home/test/miniforge3/envs/llmtest/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2597, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/test/miniforge3/envs/llmtest/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/home/test/miniforge3/envs/llmtest/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/home/test/miniforge3/envs/llmtest/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/home/test/miniforge3/envs/llmtest/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "/home/test/miniforge3/envs/llmtest/lib/python3.11/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/home/test/miniforge3/envs/llmtest/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1158, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to use ipex-llm to run both the original and quantized versions of the Baichuan2 model on an Intel GPU #11959

how to use ipex-llm to run both the original and quantized versions of the Baichuan2 model on an Intel GPU #11959

tao-ov commented Aug 29, 2024

tao-ov commented Aug 29, 2024

how to use ipex-llm to run both the original and quantized versions of the Baichuan2 model on an Intel GPU #11959

how to use ipex-llm to run both the original and quantized versions of the Baichuan2 model on an Intel GPU #11959

Comments

tao-ov commented Aug 29, 2024

Load and optimize the INT4 model with IPEX

Load the FP32 model and tokenizer

Move models and data to XPU

Ensure data is also moved to XPU

Create the evaluator with the models and tokenizer

Score the model and get metrics

tao-ov commented Aug 29, 2024