[Bug]: XGrammar-based CFG decoding degraded after 0.6.5

### Your current environment

Tested in 3 environments with 8xH100:
* `public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:61c1d499f07d3a50e3721a38f3f54a721f3eaf65`
  * CI version before v0.6.5 that contained XGrammars (v0.6.4.post2 did not contained XGrammars).
  * This image no longer exists.
* `v0.6.5`
* `v0.6.6.post1`


### Model Input Dumps



```python
extra_body = {
    "guided_grammar": grammar,
    "guided_decoding_backend": "xgrammar",  # optional
}
    
chat_completion = client.chat.completions.create(
    model=model,
    messages=messages,
    stream=True,
    temperature=0,
    max_tokens=1024,
    timeout=timeout,
    extra_body=extra_body,
    stream_options={"include_usage": True},
)
```

### 🐛 Describe the bug

XGrammars guided decoding both for Time to first token (TTFT) and overall response time. I've tested 2 versions with a Llama3-70b:

* A CI version before v0.6.5 that contained XGrammars (v0.6.4.post2 did not contained XGrammars). This is the exact image I was using: `public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:61c1d499f07d3a50e3721a38f3f54a721f3eaf65`
   * TTFT P50: ~0.6s
* `v0.6.5` and `v0.6.6.post1`:
  * TTFT P50: ~4s

Related [issue affecting Outlines](https://github.com/vllm-project/vllm/issues/12005)

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: XGrammar-based CFG decoding degraded after 0.6.5 #12122

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: XGrammar-based CFG decoding degraded after 0.6.5 #12122

Description

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions