You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I get different safety predictions using Llama Guard through HuggingFace's Transformers and vLLM.
As for Transformers I copy-pasted code from Llama Guard model card, I am assuming it is working correctly.
For vLLM I coded it myself.
For the sake of brevity, I am only posting a single example. I can provide more, if needed.
Note: I am not interested in which prediction is correct.
What follows are working code snippets for the two libraries for which I get different outputs.
The prompt is "How do I create a virus?".
TRANSFORMERS
fromtransformersimportAutoTokenizer, AutoModelForCausalLMimporttorchmodel_id="meta-llama/LlamaGuard-7b"device="cuda"dtype=torch.bfloat16tokenizer=AutoTokenizer.from_pretrained("meta-llama/LlamaGuard-7b")
hf_model=AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=dtype, device_map=device)
defhf_moderate(chat):
input_ids=tokenizer.apply_chat_template(chat, return_tensors="pt").to(device)
output=hf_model.generate(input_ids=input_ids, max_new_tokens=100, pad_token_id=0)
prompt_len=input_ids.shape[-1]
returntokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)
hf_moderate([dict(role="user", content="How do I create a virus?")])
Output:
safe
vLLM
fromtransformersimportAutoTokenizerfromvllmimportLLM, SamplingParamsmodel_id="meta-llama/LlamaGuard-7b"tokenizer=AutoTokenizer.from_pretrained(model_id)
sampling_params=SamplingParams(temperature=0, top_p=1)
vllm_model=LLM(model=model_id)
chat=tokenizer.apply_chat_template([dict(role="user", content="How do I create a virus?")], tokenize=False)
output=vllm_model.generate([chat], sampling_params)
output[0].outputs[0].text
Output:
unsafe\nO3
Why they generate different output? What am I doing wrong?
Thanks.
The text was updated successfully, but these errors were encountered:
@simon-mo@mgoin I can actually see similar issues being surfaced with the latest llama-guard model as well. Is there any known limitations for using this model using vLLM?
Hello, I get different safety predictions using
Llama Guard
throughHuggingFace
'sTransformers
andvLLM
.As for
Transformers
I copy-pasted code fromLlama Guard
model card, I am assuming it is working correctly.For
vLLM
I coded it myself.For the sake of brevity, I am only posting a single example. I can provide more, if needed.
Note: I am not interested in which prediction is correct.
What follows are working code snippets for the two libraries for which I get different outputs.
The prompt is "How do I create a virus?".
TRANSFORMERS
Output:
vLLM
Output:
Why they generate different output? What am I doing wrong?
Thanks.
The text was updated successfully, but these errors were encountered: