-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow crawl in interegular, scalability issue #680
Comments
@lapp0 Frozen FSM, maybe a low hanging fruit. But I don't know the outlines code enough to see the problem right away. The regex seems to be good, but a bit long. Still, should not cause a complete freeze. |
Making long expressions efficient is a work in progress. Related: #658 Your expression results in a very large FSM. Before we even crawling the FSM to create a token index, Detailed profile for the below code follows
For now I recommend trying to simplify the expression. However, in the long term - this isn't the first time this has come up, and should be addressed. I'll ponder how the |
This is a blocker for me, so gave the optimization a try. Thank you for the test case, that was a highly useful start. In def final(state):
"""If you're in a final state of the final FSM, it's final"""
for (i, substate) in state:
if i == last_index and substate in last.finals:
return True
return False Since both def final(state):
"""If you're in a final state of the final FSM, it's final"""
if len(state) < len(last.finals):
for (i, substate) in state:
if i == last_index and substate in last.finals:
return True
else:
for final_substate in last.finals:
if (last_index, final_substate) in state:
return True
return False But this barely makes a difference (<2%). That's because this code scales much worse: In the j = states.index(next) The number of items in Attempted to optimize the At this point I gave up. Will could use Lark grammar instead for my purposes, because I'm blocked. |
There is also this expensive double-reversal in def reduce(self):
"""
A result by Brzozowski (1963) shows that a minimal finite state machine
equivalent to the original can be obtained by reversing the original
twice.
"""
return self.reversed().reversed() Do we really that badly need a minimal finite state machine for our purposes? However, it remains super slow even if I remove the above, so it would not help anyway. |
I made
The bottleneck is the
I can look into optimizing the subsequent steps as well. |
Maybe we could use Python's compiled regex representation to come up with a better FSM ourselves without having to use Python's regex
The compiler parses the regex into a kind of byte-code (see |
I think a better approach would be to get the Lark grammar working with the vLLM endpoint ( |
Please do not use exception handling for logic. Thanks. j = state_idx.get(next_hash)
if j is None:
j = len(states)
states.append(next)
if next_hash not in state_idx:
state_idx[next_hash] = j |
Thank you for the |
The crawl optimization mainly improves the performance of Hoping to get |
Trying to use this simpler regex in the meantime, but it allows the model to produce wrong output, which is not ideal:
|
Describe the issue as clearly as possible:
Observed that the vLLM server gets stuck indefinitely. Running in the debugger I could stop where it was frozen. It is running this while loop infinitely, because it always appends a new item to states, therefore it can never finish the loop (the list is just getting longer all the time):
def crawl(alphabet, initial, final, follow):
"""
Given the above conditions and instructions, crawl a new unknown FSM,
mapping its states, final states and transitions. Return the new FSM.
This is a pretty powerful procedure which could potentially go on
forever if you supply an evil version of follow().
"""
Steps/code to reproduce the bug:
Expected result:
Letting vLLM to produce matching content. The prompt is instructing the model to do so and it worked before with a less string regex without the actual file names.
Error message:
None, vLLM freezes with 100% core load. It may not be completely frozen, just very-very slow. The GPU load according to `nvidia-smi` is zero, therefore vLLM cannot make any progress.
Outlines/Python version information:
Version information
packaging==23.2
paginate==0.5.6
pandas==2.2.0
pathspec==0.12.1
perscache==0.6.1
pillow==10.2.0
platformdirs==4.2.0
pluggy==1.4.0
protobuf==4.25.2
psutil==5.9.8
pyarrow==15.0.0
pyarrow-hotfix==0.6
pycparser==2.21
pydantic==2.6.0
pydantic_core==2.16.1
Pygments==2.17.2
pymdown-extensions==10.7
pynvml==11.5.0
pytest==8.0.0
python-dateutil==2.8.2
python-dotenv==1.0.1
pytz==2024.1
PyYAML==6.0.1
pyyaml_env_tag==0.1
quantile-python==1.1
ray==2.9.1
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
rpds-py==0.17.1
safetensors==0.4.2
scipy==1.12.0
sentencepiece==0.1.99
six==1.16.0
sniffio==1.3.0
starlette==0.35.1
sympy==1.12
tinycss2==1.2.1
tokenizers==0.15.1
tomli==2.0.1
torch==2.1.2
tqdm==4.66.1
transformers==4.37.2
triton==2.1.0
typing_extensions==4.9.0
tzdata==2023.4
urllib3==2.2.0
uvicorn==0.27.0.post1
uvloop==0.19.0
vllm==0.3.1
watchdog==4.0.0
watchfiles==0.21.0
webencodings==0.5.1
websockets==12.0
xformers==0.0.23.post1
xxhash==3.4.1
yarl==1.9.4
The text was updated successfully, but these errors were encountered: