How to use gen fields in outlines? #176

arunpatro · 2023-07-08T05:41:01Z

One aspect of guidance programs that I like are gen fields. Consider this:

import guidance
gpt = guidance.llms.Transformers('gpt2')

program = guidance('''
Given the description of a task, generate a python program that solves it, and the completes the unit test.

Description: {{description}}
Unit Test: {{unit_test}}
Code: {{gen 'code' do_sample=True temperature=0.5 max_tokens=300 stop='Description'}}
''', llm=gpt)

out = program(description='A function to create 10 fibonacci numbers', unit_test='xxxxxx')

And then we can extract the output as out['code'].

How can I do this intuitively in outlines? Does outlines plan to support a gen field syntax?

The text was updated successfully, but these errors were encountered:

rlouf · 2023-07-10T13:43:49Z

Would something like this work?

import outlines.models as models
import outlines.text as text

@text.prompt
def unit_test_prompt(description, unit_test):
    """Given the description of a task, generate a python program that solves it, and the completes the unit test.

    Description: {{description}}
    Unit Test: {{unit_test}}
    Code: {{gen 'code' do_sample=True temperature=0.5 max_tokens=300 stop='Description'}}
    """

prompt = unit_test_prompt("A function to create 10 fibonacci numbers", "xxxxxx")
complete = models.text_completion.transformers('gpt2')
out = complete(prompt)

arunpatro · 2023-07-10T16:24:20Z

Yes, this works. This is quite simple. Another requirement would be to sample multiple completions at once. I think this could be specified like:

"{{ gen 'code' samples=10 temperature=0.5}}" 
...
out = complete(...)
...
codes = out['code']

codes should be a list because of samples > 1. Currently guidance does this incorrectly because it does generations in streams and multiple streams will interleave and the final answer (a single string) will look gibberish.

Although this can be a tricky if there are many gen fields with samples > 1.

rlouf · 2023-07-10T18:46:45Z

Should work with

out = complete(prompt, num_samples=10)

Then you just pass out to the next generator without specifying samples and it should be fine.

arunpatro · 2023-07-10T19:28:39Z

Then you just pass out to the next generator without specifying samples and it should be fine.

This could work.

Just to clarify, this gen field construct is not yet supported right? It doesn't work rn.

rlouf · 2023-07-10T19:42:40Z

Just to clarify, this gen field construct is not yet supported right? It doesn't work rn.

If you're asking whether outlines has an infilling DSL like guidance or LMQL have, no. It would be however easy to implement. I sketched it here and you just reminded me to open an issue to track progress on this.

Also, may I ask why you'd want a DSL like guidance's?

arunpatro · 2023-07-10T21:42:44Z

I envision a prompt template to actually have fields that are user provided or generated, aka via gen fields. Running a infilling model on it to generate outputs, and then later access them via a dictionary output.

I do not want anything more than that, I do not want if-else-then block creation inside the prompt (hence called a program in guidance) to modify the prompt at will. We should perhaps implement all the prompt modifying business using regular python functions + jinja templating.

I did not find @text.infilling yet in the API, but if you are proposing an API like that, it should be good to start with.

arunpatro · 2023-07-10T21:52:38Z

In tree based searches, for example if we want to solve the problem and guarantee a structured evaluation process, we can use the template:

Use these numbers and basic arithmetic to get 24

Input: {{input}}
Steps:
{{gen 'step1'}}
{{gen 'step2'}}
{{gen 'step3'}}
Output: {{gen 'ans'}}

One way to get the outputs, is to simply run a next token prediction and complete fields until \n.

The only way to get multiple combinations of step1, step2, step3, ans is to sample each of the fields given the previous completed fields, which can take $n^4$ iterations (n = #samples) If there is a way to get all these n^4 solutions in a faster way by vectorizing, batching, kv-caching that would helpful in creating System 2 solutions.

rlouf · 2023-07-11T06:09:36Z

From what I understand you would like three different things:

KV caching across generations to improve performance. We're currently working on an implementation that we believe would be faster than guidance's. It is tracked by Implememt k-v cache for transformers models #150.
The ability to generate a tree of generations by successively taking several samples for each output of the previous step. Afaik this is already doable with outlines. The original goal behind vectorisation was to allow the implementation of Tree of Thoughts.
An infilling DSL (which was sketched in the link I sent you) for convenience. This is not necessary, your example can already be implemented in Outlines, but as stated above I'm open to adding that thin layer on top the existing capabilities.

In summary (2) looks like it's already doable (3) is convenient but non-blocking, and we should see together how we can prioritise (1). Did I get this right?

arunpatro · 2023-07-11T16:55:56Z

Yup you did. Indeed solving (1) will be most RoI. I think users would want an (3) infilling api, to me it seems natural for text-completion tasks, can you show how one can do it in outlines in a simple fashion?

Same for (2) - Will successive calling of model.complete(prompt) take care of the unfilled fields? I assume I need to do prompt concatenations whereever required. I am trying to gauge to complexity of implementing this in outlines.

rlouf · 2023-07-12T14:11:09Z

Try the following. You can step through the program and look at the shape of the outputs, (10,), (10,10), (10,10,10). If I didn't understand correctly and you just need 10 samples for the whole sequence then remove samples=10 in the second, third and fourth call to the model.

import outlines.models as models
import outlines.text as text
import numpy as np

@text.prompt
def arithmetic_prompt(input):
    """Use these numbers and basic arithmetic to get 24

     Input: {{input}}
     Steps: """

add = np.char.add
model = models.text_completion.openai("text-davinci-003")

prompt = arithmetic_prompt("input")
step_1 = model(prompt, stop_at=["\n"], samples=10)

prompt = add(add(prompt, step_1), "\nStep 2: ")
step_2 = model(prompt, stop_at=["\n"], samples=10)

prompt = add(add(prompt, step_2), "\nStep 3: ")
step_3 = model(prompt, stop_at=["\n"], samples=10)

prompt = add(add(prompt, step_3), "\nOutput: ")
ans = model(prompt, samples=10)

rlouf · 2023-07-12T14:46:20Z

I opened #182 to track the infilling DSL implementation. Feel free to take a stab at it.

rlouf added the question label Jul 10, 2023

rlouf mentioned this issue Jul 13, 2023

Make sure that the new API works with arbitrary arrays as inputs #185

Closed

rlouf closed this as completed Jul 26, 2023

randomcodelookup mentioned this issue Apr 12, 2024

Documentation of interleaving support in generation / in filling / transformers support? #812

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use gen fields in outlines? #176

How to use gen fields in outlines? #176

arunpatro commented Jul 8, 2023 •

edited

Loading

rlouf commented Jul 10, 2023

arunpatro commented Jul 10, 2023 •

edited

Loading

rlouf commented Jul 10, 2023 •

edited

Loading

arunpatro commented Jul 10, 2023

rlouf commented Jul 10, 2023 •

edited

Loading

arunpatro commented Jul 10, 2023

arunpatro commented Jul 10, 2023 •

edited

Loading

rlouf commented Jul 11, 2023 •

edited

Loading

arunpatro commented Jul 11, 2023 •

edited

Loading

rlouf commented Jul 12, 2023 •

edited

Loading

rlouf commented Jul 12, 2023

How to use gen fields in outlines? #176

How to use gen fields in outlines? #176

Comments

arunpatro commented Jul 8, 2023 • edited Loading

rlouf commented Jul 10, 2023

arunpatro commented Jul 10, 2023 • edited Loading

rlouf commented Jul 10, 2023 • edited Loading

arunpatro commented Jul 10, 2023

rlouf commented Jul 10, 2023 • edited Loading

arunpatro commented Jul 10, 2023

arunpatro commented Jul 10, 2023 • edited Loading

rlouf commented Jul 11, 2023 • edited Loading

arunpatro commented Jul 11, 2023 • edited Loading

rlouf commented Jul 12, 2023 • edited Loading

rlouf commented Jul 12, 2023

arunpatro commented Jul 8, 2023 •

edited

Loading

arunpatro commented Jul 10, 2023 •

edited

Loading

rlouf commented Jul 10, 2023 •

edited

Loading

rlouf commented Jul 10, 2023 •

edited

Loading

arunpatro commented Jul 10, 2023 •

edited

Loading

rlouf commented Jul 11, 2023 •

edited

Loading

arunpatro commented Jul 11, 2023 •

edited

Loading

rlouf commented Jul 12, 2023 •

edited

Loading