Refactor of the main user interface #1220

rlouf · 2024-10-21T13:24:17Z

rlouf
Oct 21, 2024
Maintainer

The current design of the library is not flexible enough:

A new output type requires a new function in the outlines.generate module;
Users need to remember the names of these different functions;
It is difficult to pass model-specific initialization and inference parameters.

New user interface

We need to make the interface of the library simpler and more flexible. I propose the following design, in pseudo-code:

from outlines import models


model = models.provider("name")
result = model(prompt, OutputType)

Users thus need only be concerned about the output type, be it a Python type, a Pydantic model, etc. without having to learn new functions. This implicitly re-centers Outlines around the definition of output types.

Extra parameters

Any other value passed to models.provider is passed directly to the initialization function in the corresponding library:

model = models.provider("name", *init_args, **init_kwargs)

Same for other values passed to the __call__ method of the model:

result = model(prompt, OutputType, *inference_args, **inference_kwargs)

This will give users more flexibility. For instance this would solve #1199, and would allow users to use a wider variety of sampling algorithms that those described in samplers.py. It would also simplify the code as we will not be trying to normalize the parameters anymore. See here, here or here for example.

Outlines will become a thin wrapper around the libraries to augment them with a friendly interface to do structured generation.

Async execution

Asynchronous execution is necessary for agentic workflows, among other things. We should thus support async calls whenever possible:

OpenAI
Gemini
vLLM. This will require us to wrap AsyncLLMEngine.

Streaming

We should also offer the possibility to stream tokens, although I am not quite sure how that would work with types such as Pydantic models. A common way to do this is to pass streaming=True to the generation function:

results = model(prompt, output_type, stream=True)

Although I am not a big fan of this and would prefer a new method such as:

results = model.stream(prompt, output_type)

Multi-modal models

Multi-modal models are different form text-2-text model in that they accept multiple modalities as an input. I thus believe they can simply be handled by defining specific input types:

from outlines.inputs import Vision


model("prompt", output_type)  # Text to `output_type`
model(Vision("prompt", image), output_type)  # Text & Image to `output_type`

In this case, however, if image is of type PIL.Image we may be able to simply pass a tuple as an input:

model(("prompt", image), output_type)

In any case, this should be handled by looking at the types of the inputs.

Reviewers

@torymur, @lapp0

torymur · 2024-10-22T12:57:56Z

torymur
Oct 22, 2024
Maintainer

It seems to be worth it to take it to the full Builder Pattern, which will provide:

flexibility: it would be easy to adjust it in any way
clarity: separates all these concerns perfectly

Imagine this interface:

from outlines import models, OutputType

model = models.provider("name", *init_args, **init_kwargs)
result = model \
    .inference_settings(*args, **kwargs) \ # optional
    .txt_prompt() | .visual_prompt() | .audio_prompt() \ # any or all
    .output_type(OutputType) \
    .stream() or .load() # final call, which is hidden build() + output approach

Where:

1. `.load()`

As in json/pickle, as one of the explicit alternative names to streaming, which is to say "buffering/batching", but in more pythonic way.
There are other names too, for example:

fetch as in requests:
results = model.fetch(prompt, output_type)
extract as in lxml/BeautifulSoup:
results = model.extract(prompt, output_type)

Or maybe something else.

2. Unsupported prompt types

Since it's difficult to determine what particular model can and cannot do in terms of types of prompts (txt, visual, video, audio, etc.), we could provide all types and catch the exceptions with nice consistent message of unsupported prompt type.

3. Chaining prompts

Builder Pattern will also allow easily chain different prompt types, which might be quite useful.

11 replies

rlouf Oct 30, 2024
Maintainer Author

Looks like caching could be considered as a separate feature. And user might decide to disable cache, but even in that case interface needs to allow efficient calls with different prompts as many times as needed without costly index building call each time, yeah.

Agreed.

Which hints to having something like:

from outlines import generator

generate = generator(model, output_type)
stream_result = generate.stream("prompt")
collect_result = generate.collect("another prompt")

Your design has a lot of merit, and I am wondering if we can push it a little further. What if we made generator an actual Python generator:

generate = generator(model, output_type)
tokens = generate("prompt")

for token in tokens:
    print(token)

And we can consume the Generator object like you would any Python generator:

result = consume(tokens)

This feel more "pythonic". The downside with this design, however, is that it requires a big shift in users' mental model who are used to get the full generation by default (Note that vLLM's AsyncLLMEngine streams by default)

How would it look like?

If we have a pre-compiled Index object called index it would simply look like

generate = generator(model, index)

And we'd have a separate function to compile an index based on a model and output type. There is a risk that people will use an index that was compiled for a given vocabulary with a model that has a different vocabulary, but we can probably do something about it by storing the model name along the index.

torymur Oct 31, 2024
Maintainer

What if we made generator an actual Python generator

It could be that too, if generator(model, output_type)("prompt") returns a pure python generator, then it could be dealt with as any other generator - be iterated over or consumed at once in usual manner: list(tokens)

The downside with this design, however, is that it requires a big shift in users' mental model who are used to get the full generation by default

Yeah, by nature it's unidirectional from stream to full, we can only offer different interface: generator or builder.
Hopefully, people are used to forcing full generation by list(...).

There is a risk that people will use an index that was compiled for a given vocabulary with a model that has a different vocabulary, but we can probably do something about it by storing the model name along the index.

If index is being computed for every (model, OutputType), then it eliminates this mismatch problem, no?
It would be like: generator(index)

rlouf Oct 31, 2024
Maintainer Author

Using list(tokens) to get the full generation is inconvenient in this case as you would get a list of tokens when the user generally wants "".join(list(tokens))

rlouf Oct 31, 2024
Maintainer Author

Alternatively, generate("prompt") can return an itertools.accumulate object built from an iterator that returns tokens one by one. The full generation is then obtained as:

*_, result = generate("prompt")

torymur Nov 1, 2024
Maintainer

user generally wants "".join(list(tokens))

Ah, you're right! Well, if only string types will be in generator, then it's possible to use shorter version: ''.join(token)

So options are:

''.join(tokens)
alias helper, but import line won't be as simple: from outlines import generator, consume (or some other name instead of consume, but still a separate name)
custom iterator with helpers: generate("prompt").join()

generate("prompt") can return an itertools.accumulate object

This will return "unexpected" values, which are not new on every iter, but accumulated already, which then prevent usage in a loop. This argument might be nullified, considering the overall weirdness of streaming showed by @cpfiffer

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor of the main user interface #1220

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 14 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Select a reply

Refactor of the main user interface #1220

rlouf Oct 21, 2024 Maintainer

New user interface

Extra parameters

Async execution

Streaming

Multi-modal models

Reviewers

Replies: 2 comments · 14 replies

torymur Oct 22, 2024 Maintainer

Where:

1. .load()

2. Unsupported prompt types

3. Chaining prompts

rlouf Oct 30, 2024 Maintainer Author

torymur Oct 31, 2024 Maintainer

rlouf Oct 31, 2024 Maintainer Author

rlouf Oct 31, 2024 Maintainer Author

torymur Nov 1, 2024 Maintainer

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

rlouf
Oct 21, 2024
Maintainer

Replies: 2 comments 14 replies

torymur
Oct 22, 2024
Maintainer

1. `.load()`

rlouf Oct 30, 2024
Maintainer Author

torymur Oct 31, 2024
Maintainer

rlouf Oct 31, 2024
Maintainer Author

rlouf Oct 31, 2024
Maintainer Author

torymur Nov 1, 2024
Maintainer