[Bug] Failing to output non-EN #109

NanoCode012 · 2024-05-25T16:25:08Z

Hey! Thank you for the nice tool and integrations. I've been trying this out with English JSON parsing using vllm, and it works great!

However, when I tried with a JP model (like the recently released aya from Cohere and llama3 fine tunes), I received cut off outputs.

result = json.loads(result)

Failed parsing output: {
"Input": "ミ

Do you perhaps know why it's occurring? My initial guess after looking at the repo was that it's not able to build a character tree due to these unicode characters and early stopping.

I checked the other Issues, and they are having issues with the key being non-EN. In this case, it's the content itself. I've tried it models without lm-format-enhancer on, and it seems to output ok without cutoff early on (though it can't output JSON consistently as expected).

Env: vllm==0.4.1 lm-format-enforcer==0.9.8

The text was updated successfully, but these errors were encountered:

noamgat · 2024-05-31T06:27:29Z

Hi! Can you please share the model+schema+prompt that you are trying to use? If this reproduces on a 7B (or less) model it will be much easier to debug.

rdlwicked · 2024-05-31T14:20:06Z

The formatter seems unable to proceed a generation every time it generates a Roman number.

I am currently generating book names using the qwen1.5-110b-32k model and I found that every time a book name with a roman number is generated, the generation just stops.

Here is an example:

{"实体1": "三体系列", "实体2": "三体Ⅱ

and the generation just stops even the schema hasn't finished yet.

This happens every time so I guess it's to do with the formatter as it doesn't happen when the formatter isn't applied.

liqul · 2024-07-02T05:59:41Z

Got this exact problem. Any solution or workaround? I'm using Llama-3-8b-instruct and the HF transformers lib to do generation.

ericperfect · 2024-08-15T10:58:34Z

The formatter seems unable to proceed a generation every time it generates a Roman number.

I am currently generating book names using the qwen1.5-110b-32k model and I found that every time a book name with a roman number is generated, the generation just stops.

Here is an example:

{"实体1": "三体系列", "实体2": "三体Ⅱ

and the generation just stops even the schema hasn't finished yet.

This happens every time so I guess it's to do with the formatter as it doesn't happen when the formatter isn't applied.

i got the same problem. Any soultion?thinks

jamestwhedbee · 2024-08-16T14:35:02Z

Just ran into this myself

jamestwhedbee · 2024-08-16T14:44:00Z

@noamgat here is a minimal example using guided decoding in vllm with LMFE v0.10.6

import os
from openai import OpenAI


openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

messages = [{"role": "user", "content": "Find the definite integral of f(x)=x^2 from x=1 to x=3."}]
chat_completion = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct",
    messages=messages,
    temperature=0.0,
    stream=True,
    extra_body={
      "guided_json": {
         "type": "object",
         "properties": {
            "explanation": {
              "description": "make sure to use mathematical notation in your explanation",
              "type": "string"
            }
         },
         "required": ["explanation"]
      }
    }
)
for chunk in chat_completion:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content)

print()

Which outputs

{
 
 

"
e
xp
lan
ation
":
 "
The
 definite
 integral
 of
 a
 function
 f
(x
)
 from
 x
=a
 to
 x
=b
 is
 den
oted
 as
 ∫

noamgat · 2024-09-03T19:39:02Z

Thanks for the reproduction, this is something I hope to tackle in the next major version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Failing to output non-EN #109

[Bug] Failing to output non-EN #109

NanoCode012 commented May 25, 2024

noamgat commented May 31, 2024 •

edited

Loading

rdlwicked commented May 31, 2024

liqul commented Jul 2, 2024

ericperfect commented Aug 15, 2024

jamestwhedbee commented Aug 16, 2024

jamestwhedbee commented Aug 16, 2024

noamgat commented Sep 3, 2024

[Bug] Failing to output non-EN #109

[Bug] Failing to output non-EN #109

Comments

NanoCode012 commented May 25, 2024

noamgat commented May 31, 2024 • edited Loading

rdlwicked commented May 31, 2024

liqul commented Jul 2, 2024

ericperfect commented Aug 15, 2024

jamestwhedbee commented Aug 16, 2024

jamestwhedbee commented Aug 16, 2024

noamgat commented Sep 3, 2024

noamgat commented May 31, 2024 •

edited

Loading