Intel NPU operation related #1081

Oneul-hyeon · 2024-12-19T09:57:03Z

Hello

I want to use On-device sLM using NPU which is currently equipped in "Intel(R) Core(TM) Ultra 5".

However, although I confirmed the operation of CPU and iGPU in the code below, no answer is output for NPU.

from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer
import time

def make_template(context) :
    instruction=f"""You are an assistant who translates meeting contents.
Translate the meeting contents given after #Context into English.

#Context:{context}

#Translation:"""
    
    messages=[{"role": "user", "content": f"{instruction}"}]

    input_ids=tokenizer.apply_chat_template(messages,
                                                    add_generation_prompt = True,
                                                    return_tensors="pt")

    return input_ids

def translate(context) : 
    input_ids=make_template(context=context)
    outputs=model.generate(input_ids,
                                max_new_tokens=max_new_tokens,
                                do_sample=do_sample,
                                temperature=temperature,
                                top_p=top_p)
    
    answer=tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
    
    return answer.rstrip()

if __name__ == "__main__" :
    model_id = "AIFunOver/gemma-2-2b-it-openvino-8bit"
    model = OVModelForCausalLM.from_pretrained(model_id, device="npu")
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    print(f"Model Device : {model.device}")

    max_new_tokens=1024
    do_sample=False
    temperature=0.1
    top_p=0.9

    context = '''A: Hello.
B: Oh, yes, hello. I'm contacting you because I have a question. They're doing water pipe construction in my neighborhood, and I'm curious as to how long it will take.
A: Where is your area?
B: Daejeon Byeundae-dong.
A: The construction will continue until tomorrow, sir.
B: Oh really? Oh, but won't there be muddy water after the construction is over?
A: It's better to let out enough water before using it after the construction is over, sir.
B: How much water should I drain?
A: Let out for 2~3 minutes.
B: Okay, I understand. Then, can there be another problem?
A: The water pressure may temporarily drop slightly.
B: Temporarily?
A: Yes, it's a temporary phenomenon and will return to normal pressure right away.
B: What should I do if it lasts a long time?
A: In that case, you can report it to the Waterworks Headquarters.
B: Yes, I understand.
B: But they say it's going to rain tomorrow, so can the construction be finished tomorrow? I think they usually don't do construction on rainy days? A: In case of rain, construction may be slightly delayed. If it doesn't rain too much, construction will proceed as scheduled. Customer, please don't worry too much.
B: Oh, yes, I understand. Thank you.
A: Yes, thank you.'''

    start_time = time.time()
    generated_text = translate(context)
    end_time = time.time()

    print("generated_text:", generated_text)

    num_generated_tokens = len(tokenizer.tokenize(generated_text))
    total_time = end_time - start_time
    avg_token_speed = total_time / num_generated_tokens if num_generated_tokens > 0 else float('inf')

    print(f"Total Inference Time : {total_time} s")
    print(f"Average token generation speed: {avg_token_speed:.4f} seconds/token")

However, the devices currently available for openvino include NPUs.

If there is a way to use NPU, can you tell me?

Thank you.

The text was updated successfully, but these errors were encountered:

eaidova · 2024-12-19T12:55:36Z

@Oneul-hyeon currently optimum-intel does not support inference sLM on NPU, but there is another solution that allow to do that and working on the same optimum-intel converted models, please check this guide https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide-npu.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intel NPU operation related #1081

Intel NPU operation related #1081

Oneul-hyeon commented Dec 19, 2024 •

edited

Loading

eaidova commented Dec 19, 2024

Intel NPU operation related #1081

Intel NPU operation related #1081

Comments

Oneul-hyeon commented Dec 19, 2024 • edited Loading

eaidova commented Dec 19, 2024

Oneul-hyeon commented Dec 19, 2024 •

edited

Loading