Outlines OOM w/ constrained json schema #658

fmmoret · 2024-02-13T20:09:52Z

Describe the issue as clearly as possible:

This uses up a huge amount of RAM and hangs for a long time -- 1600s on my machine with 32GB ram, current gen gaming laptop. Reducing some key constraints like constr(max_length=100) in d reduces memory usage & initial computation time in exponential fashion.

What are the recommended sets of workarounds? Is there a cacheless / lazy eval mode?
Could we choose to split the JSON FSM into multiple & concat to reduce explosion at all of the control flow points in the regex?

Steps/code to reproduce the bug:

from pydantic import BaseModel, Field, conlist, constr
import outlines


class Example(BaseModel):
    a: str = Field(
        ...,
        max_length=60,
    )
    b: conlist(
        constr(max_length=20),
        max_length=5,  # type: ignore
    ) = Field(
        ...,
    )
    c: str = Field(
        ...,
        max_length=200,
    )
    d: conlist(
        constr(max_length=100),
        max_length=5,  # type: ignore
    ) = Field(
        ...,
    )
    e: conlist(
        constr(max_length=20),
        max_length=15,  # type: ignore
    ) = Field(...)
    f: conlist(
        constr(max_length=30),
        max_length=15,  # type: ignore
    ) = Field(
        ...,
    )

model = outlines.models.transformers(
    "gradientai/gradient-tinystories-15m"
)  # llama tokenizer + tiny model
# Construct structured sequence generator
from time import time

start = time()
generator = outlines.generate.json(model, Example)
end = time()
print(f"Time to construct generator: {end - start:.2f}s")

Expected result:

To not oom and to work fast.

Error message:

No response

Outlines/Python version information:

Version information

``` 0.0.27 Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] ```

Context for the issue:

No response

The text was updated successfully, but these errors were encountered:

lapp0 · 2024-02-13T20:44:39Z

outlines.fsm.json_schema.build_regex_from_object(json.dumps(Example.model_json_schema()))
'\\{[\\n ]*"a"[\\n ]*:[\\n ]*"(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,60}"[\\n ]*,[\\n ]*"b"[\\n ]*:[\\n ]*\\[[\\n ]*(("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,20}")(,[\\n ]*("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,20}")){0,4})?[\\n ]*\\][\\n ]*,[\\n ]*"c"[\\n ]*:[\\n ]*"(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,200}"[\\n ]*,[\\n ]*"d"[\\n ]*:[\\n ]*\\[[\\n ]*(("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,100}")(,[\\n ]*("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,100}")){0,4})?[\\n ]*\\][\\n ]*,[\\n ]*"e"[\\n ]*:[\\n ]*\\[[\\n ]*(("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,20}")(,[\\n ]*("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,20}")){0,14})?[\\n ]*\\][\\n ]*,[\\n ]*"f"[\\n ]*:[\\n ]*\\[[\\n ]*(("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,30}")(,[\\n ]*("(?:[^"\\\\\\x00-\\x1f\\x7f-\\x9f]|\\\\.){,30}")){0,14})?[\\n ]*\\][\\n ]*\\}'

Yes, this is a very complex pattern and results in over 5,000 FSM states, each of which contains a set of valid tokens generated by traversing the FSM.

I agree 100% with the idea of splitting unions into multiple FSMs. For concatenated patterns it might become more complicated.

Related: #640

fmmoret · 2024-02-13T21:20:04Z

Is there an elegant way you can picture doing so while still being able to use the vLLM & pydantic integrations?

lapp0 · 2024-02-13T21:34:35Z

It's quite complex, I've been thinking about this problem for a bit. I'll ping you when I post the write-up I've been working on. (Concatenation and union operations between RegexFSM are also critical to performance of CFGFSM, but it relates here as well).

rlouf · 2024-02-13T22:28:27Z

I see a few ways we could dramatically reduce the memory that these FSMs take:

Make sure that the FSM cannot be further reduced (no redundant states)
Reduce the size of the largest lists into ranges of tokens to a state.

Which solution we should prioritize is very much an empirical question at this point.

fmmoret · 2024-03-08T08:57:18Z

Is there something tangential out there we could use instead of fsms?

This guy is claiming 10,000-100,000 times faster json constraining. https://twitter.com/jrysana/status/1765687350363906278

rlouf · 2024-03-08T09:30:31Z

Is there something tangential out there we could use instead of fsms?

There are a few things to try before completely replacing the DFA logic. For instance, how many of those transitions correspond to "allow all tokens and go to the same state"? If many, we could treat this as a special case and it would decrease the memory requirements dramatically. We could store ranges instead of token ids as well.

This guy is claiming 10,000-100,000 times faster json constraining. https://twitter.com/jrysana/status/1765687350363906278

We are yet to see the code. Anyway that's compilation time, and a few things can be done to improve this in Outlines we just didn't have the time to get around to it.

fmmoret added the bug label Feb 13, 2024

lapp0 mentioned this issue Feb 19, 2024

Very slow crawl in interegular, scalability issue #680

Open

willkurt mentioned this issue Jun 27, 2024

Add generic function calling support for any open model. #1001

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outlines OOM w/ constrained json schema #658

Outlines OOM w/ constrained json schema #658

fmmoret commented Feb 13, 2024 •

edited

Loading

lapp0 commented Feb 13, 2024 •

edited

Loading

fmmoret commented Feb 13, 2024 •

edited

Loading

lapp0 commented Feb 13, 2024 •

edited

Loading

rlouf commented Feb 13, 2024

fmmoret commented Mar 8, 2024

rlouf commented Mar 8, 2024

Outlines OOM w/ constrained json schema #658

Outlines OOM w/ constrained json schema #658

Comments

fmmoret commented Feb 13, 2024 • edited Loading

Describe the issue as clearly as possible:

Steps/code to reproduce the bug:

Expected result:

Error message:

Outlines/Python version information:

Context for the issue:

lapp0 commented Feb 13, 2024 • edited Loading

fmmoret commented Feb 13, 2024 • edited Loading

lapp0 commented Feb 13, 2024 • edited Loading

rlouf commented Feb 13, 2024

fmmoret commented Mar 8, 2024

rlouf commented Mar 8, 2024

fmmoret commented Feb 13, 2024 •

edited

Loading

lapp0 commented Feb 13, 2024 •

edited

Loading

fmmoret commented Feb 13, 2024 •

edited

Loading

lapp0 commented Feb 13, 2024 •

edited

Loading