-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outlines OOM w/ constrained json schema #658
Comments
Yes, this is a very complex pattern and results in over 5,000 FSM states, each of which contains a set of valid tokens generated by traversing the FSM. I agree 100% with the idea of splitting unions into multiple FSMs. For concatenated patterns it might become more complicated. Related: #640 |
Is there an elegant way you can picture doing so while still being able to use the vLLM & pydantic integrations? |
It's quite complex, I've been thinking about this problem for a bit. I'll ping you when I post the write-up I've been working on. (Concatenation and union operations between |
I see a few ways we could dramatically reduce the memory that these FSMs take:
Which solution we should prioritize is very much an empirical question at this point. |
Is there something tangential out there we could use instead of fsms? This guy is claiming 10,000-100,000 times faster json constraining. https://twitter.com/jrysana/status/1765687350363906278 |
There are a few things to try before completely replacing the DFA logic. For instance, how many of those transitions correspond to "allow all tokens and go to the same state"? If many, we could treat this as a special case and it would decrease the memory requirements dramatically. We could store ranges instead of token ids as well.
We are yet to see the code. Anyway that's compilation time, and a few things can be done to improve this in Outlines we just didn't have the time to get around to it. |
Describe the issue as clearly as possible:
This uses up a huge amount of RAM and hangs for a long time -- 1600s on my machine with 32GB ram, current gen gaming laptop. Reducing some key constraints like
constr(max_length=100)
ind
reduces memory usage & initial computation time in exponential fashion.What are the recommended sets of workarounds? Is there a cacheless / lazy eval mode?
Could we choose to split the JSON FSM into multiple & concat to reduce explosion at all of the control flow points in the regex?
Steps/code to reproduce the bug:
Expected result:
Error message:
No response
Outlines/Python version information:
Version information
Context for the issue:
No response
The text was updated successfully, but these errors were encountered: