Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outlines OOM w/ constrained json schema #658

Open
fmmoret opened this issue Feb 13, 2024 · 6 comments
Open

Outlines OOM w/ constrained json schema #658

fmmoret opened this issue Feb 13, 2024 · 6 comments
Labels

Comments

@fmmoret
Copy link

fmmoret commented Feb 13, 2024

Describe the issue as clearly as possible:

This uses up a huge amount of RAM and hangs for a long time -- 1600s on my machine with 32GB ram, current gen gaming laptop. Reducing some key constraints like constr(max_length=100) in d reduces memory usage & initial computation time in exponential fashion.

What are the recommended sets of workarounds? Is there a cacheless / lazy eval mode?
Could we choose to split the JSON FSM into multiple & concat to reduce explosion at all of the control flow points in the regex?

Steps/code to reproduce the bug:

from pydantic import BaseModel, Field, conlist, constr
import outlines


class Example(BaseModel):
    a: str = Field(
        ...,
        max_length=60,
    )
    b: conlist(
        constr(max_length=20),
        max_length=5,  # type: ignore
    ) = Field(
        ...,
    )
    c: str = Field(
        ...,
        max_length=200,
    )
    d: conlist(
        constr(max_length=100),
        max_length=5,  # type: ignore
    ) = Field(
        ...,
    )
    e: conlist(
        constr(max_length=20),
        max_length=15,  # type: ignore
    ) = Field(...)
    f: conlist(
        constr(max_length=30),
        max_length=15,  # type: ignore
    ) = Field(
        ...,
    )

model = outlines.models.transformers(
    "gradientai/gradient-tinystories-15m"
)  # llama tokenizer + tiny model
# Construct structured sequence generator
from time import time

start = time()
generator = outlines.generate.json(model, Example)
end = time()
print(f"Time to construct generator: {end - start:.2f}s")

Expected result:

To not oom and to work fast.

Error message:

No response

Outlines/Python version information:

Version information

``` 0.0.27 Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] ```

Context for the issue:

No response

@fmmoret fmmoret added the bug label Feb 13, 2024
@lapp0
Copy link
Contributor

lapp0 commented Feb 13, 2024

outlines.fsm.json_schema.build_regex_from_object(json.dumps(Example.model_json_schema()))
'\\{[\\n ]*"a"[\\n ]*:[\\n ]*"(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,60}"[\\n ]*,[\\n ]*"b"[\\n ]*:[\\n ]*\\[[\\n ]*(("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,20}")(,[\\n ]*("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,20}")){0,4})?[\\n ]*\\][\\n ]*,[\\n ]*"c"[\\n ]*:[\\n ]*"(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,200}"[\\n ]*,[\\n ]*"d"[\\n ]*:[\\n ]*\\[[\\n ]*(("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,100}")(,[\\n ]*("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,100}")){0,4})?[\\n ]*\\][\\n ]*,[\\n ]*"e"[\\n ]*:[\\n ]*\\[[\\n ]*(("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,20}")(,[\\n ]*("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,20}")){0,14})?[\\n ]*\\][\\n ]*,[\\n ]*"f"[\\n ]*:[\\n ]*\\[[\\n ]*(("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,30}")(,[\\n ]*("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,30}")){0,14})?[\\n ]*\\][\\n ]*\\}'

Yes, this is a very complex pattern and results in over 5,000 FSM states, each of which contains a set of valid tokens generated by traversing the FSM.

I agree 100% with the idea of splitting unions into multiple FSMs. For concatenated patterns it might become more complicated.

Related: #640

@fmmoret
Copy link
Author

fmmoret commented Feb 13, 2024

Is there an elegant way you can picture doing so while still being able to use the vLLM & pydantic integrations?

@lapp0
Copy link
Contributor

lapp0 commented Feb 13, 2024

It's quite complex, I've been thinking about this problem for a bit. I'll ping you when I post the write-up I've been working on. (Concatenation and union operations between RegexFSM are also critical to performance of CFGFSM, but it relates here as well).

@rlouf
Copy link
Member

rlouf commented Feb 13, 2024

I see a few ways we could dramatically reduce the memory that these FSMs take:

  1. Make sure that the FSM cannot be further reduced (no redundant states)
  2. Reduce the size of the largest lists into ranges of tokens to a state.

Which solution we should prioritize is very much an empirical question at this point.

@fmmoret
Copy link
Author

fmmoret commented Mar 8, 2024

Is there something tangential out there we could use instead of fsms?

This guy is claiming 10,000-100,000 times faster json constraining. https://twitter.com/jrysana/status/1765687350363906278

@rlouf
Copy link
Member

rlouf commented Mar 8, 2024

Is there something tangential out there we could use instead of fsms?

There are a few things to try before completely replacing the DFA logic. For instance, how many of those transitions correspond to "allow all tokens and go to the same state"? If many, we could treat this as a special case and it would decrease the memory requirements dramatically. We could store ranges instead of token ids as well.

This guy is claiming 10,000-100,000 times faster json constraining. https://twitter.com/jrysana/status/1765687350363906278

We are yet to see the code. Anyway that's compilation time, and a few things can be done to improve this in Outlines we just didn't have the time to get around to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants