-
Notifications
You must be signed in to change notification settings - Fork 646
Description
Describe the issue as clearly as possible:
The states_to_token_maps generated in RegexFSM.__init__ is wrong for this specific regex. It is 100% reproducible.
Notice that no key 9 exists in the FSM, which causes it to crash during generation with a KeyError.
regex_string = '```\n(Program\\.cs\n)?```\n'
self.states_to_token_maps = {
0: {63: 1, 4686: 2, 10252: 3},
1: {63: 2, 4686: 3},
2: {63: 3},
3: {185: 4},
4: {47: 5, 63: 6, 1426: 11, 4686: 7, 5959: 10, 10252: 8, 16097: 15},
5: {81: 10, 295: 11, 12483: 12},
6: {63: 7, 4686: 8},
7: {63: 8},
8: {185: 9},
10: {78: 11, 493: 12, 18596: 15},
11: {70: 12, 877: 13, 1644: 15, 16795: 14},
12: {81: 13, 401: 14, 3477: 15},
13: {64: 14, 302: 15},
14: {76: 15},
15: {13: 16},
16: {66: 17, 5494: 18},
17: {82: 18},
18: {185: 19},
19: {63: 6, 4686: 7, 10252: 8},
}Steps/code to reproduce the bug:
Use the regex as a constraint on a prompt like:
Please list the all the filenames from the code block below in the same order.
Write your answer as a code block. Do not explain, do not apologize.
Write only the code block and nothing else.
Program.cs
Expected result:
Program.csError message:
Exception while running such a query with vLLM 0.3.0 serving, but the actual problem does not depend on vLLM at all:
INFO: 192.168.1.70:61435 - "POST /generate HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 29, in _raise_exception_on_finish
task.result()
File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 409, in run_engine_loop
has_requests_in_progress = await self.engine_step()
File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 388, in engine_step
request_outputs = await self.engine.step_async()
File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 189, in step_async
all_outputs = await self._run_workers_async(
File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 276, in _run_workers_async
all_outputs = await asyncio.gather(*coros)
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/worker/worker.py", line 213, in execute_model
output = self.model_runner.execute_model(seq_group_metadata_list,
File "/home/viktor/env/outlines/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 542, in execute_model
output = self.model.sample(
File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 314, in sample
next_tokens = self.sampler(self.lm_head.weight, hidden_states,
File "/home/viktor/env/outlines/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 74, in forward
logits = _apply_logits_processors(logits, sampling_metadata)
File "/home/viktor/dep/outlines-contrib/outlines/serve/vllm.py", line 35, in _patched_apply_logits_processors
logits_row = logits_processor(token_ids, logits_row)
File "/home/viktor/dep/outlines-contrib/outlines/serve/vllm.py", line 140, in __call__
state = self.fsm.get_state_by_token_ids(tuple(input_ids))
File "/home/viktor/dep/outlines-contrib/outlines/serve/vllm.py", line 93, in get_state_by_token_ids
new_state = self.next_state(prev_state, last_token)
File "/home/viktor/dep/outlines-contrib/outlines/fsm/fsm.py", line 178, in next_state
last_token_to_end_state = self.states_to_token_maps[state]
KeyError: 9
Outlines/Python version information:
Version information
Context for the issue:
I've just started using constrained generation with vLLM based on outlines.serve.vllm. Found this issue while working on #539, but this issue is unrelated to the vLLM adapter, therefore created this new ticket.