Replies: 2 comments 14 replies
-
It seems to be worth it to take it to the full Builder Pattern, which will provide:
Imagine this interface: from outlines import models, OutputType
model = models.provider("name", *init_args, **init_kwargs)
result = model \
.inference_settings(*args, **kwargs) \ # optional
.txt_prompt() | .visual_prompt() | .audio_prompt() \ # any or all
.output_type(OutputType) \
.stream() or .load() # final call, which is hidden build() + output approach Where:1.
|
Beta Was this translation helpful? Give feedback.
-
The current design of the library is not flexible enough:
outlines.generate
module;New user interface
We need to make the interface of the library simpler and more flexible. I propose the following design, in pseudo-code:
Users thus need only be concerned about the output type, be it a Python type, a Pydantic model, etc. without having to learn new functions. This implicitly re-centers Outlines around the definition of output types.
Extra parameters
Any other value passed to
models.provider
is passed directly to the initialization function in the corresponding library:Same for other values passed to the
__call__
method of the model:This will give users more flexibility. For instance this would solve #1199, and would allow users to use a wider variety of sampling algorithms that those described in
samplers.py
. It would also simplify the code as we will not be trying to normalize the parameters anymore. See here, here or here for example.Outlines will become a thin wrapper around the libraries to augment them with a friendly interface to do structured generation.
Async execution
Asynchronous execution is necessary for agentic workflows, among other things. We should thus support async calls whenever possible:
AsyncLLMEngine
.Streaming
We should also offer the possibility to stream tokens, although I am not quite sure how that would work with types such as Pydantic models. A common way to do this is to pass
streaming=True
to the generation function:Although I am not a big fan of this and would prefer a new method such as:
Multi-modal models
Multi-modal models are different form text-2-text model in that they accept multiple modalities as an input. I thus believe they can simply be handled by defining specific input types:
In this case, however, if
image
is of typePIL.Image
we may be able to simply pass a tuple as an input:In any case, this should be handled by looking at the types of the inputs.
Reviewers
@torymur, @lapp0
Beta Was this translation helpful? Give feedback.
All reactions