Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use gen fields in outlines? #176

Closed
arunpatro opened this issue Jul 8, 2023 · 11 comments
Closed

How to use gen fields in outlines? #176

arunpatro opened this issue Jul 8, 2023 · 11 comments
Labels

Comments

@arunpatro
Copy link
Contributor

arunpatro commented Jul 8, 2023

One aspect of guidance programs that I like are gen fields. Consider this:

import guidance
gpt = guidance.llms.Transformers('gpt2')

program = guidance('''
Given the description of a task, generate a python program that solves it, and the completes the unit test.

Description: {{description}}
Unit Test: {{unit_test}}
Code: {{gen 'code' do_sample=True temperature=0.5 max_tokens=300 stop='Description'}}
''', llm=gpt)

out = program(description='A function to create 10 fibonacci numbers', unit_test='xxxxxx')

And then we can extract the output as out['code'].

How can I do this intuitively in outlines? Does outlines plan to support a gen field syntax?

@rlouf
Copy link
Member

rlouf commented Jul 10, 2023

Would something like this work?

import outlines.models as models
import outlines.text as text

@text.prompt
def unit_test_prompt(description, unit_test):
    """Given the description of a task, generate a python program that solves it, and the completes the unit test.

    Description: {{description}}
    Unit Test: {{unit_test}}
    Code: {{gen 'code' do_sample=True temperature=0.5 max_tokens=300 stop='Description'}}
    """

prompt = unit_test_prompt("A function to create 10 fibonacci numbers", "xxxxxx")
complete = models.text_completion.transformers('gpt2')
out = complete(prompt)

@rlouf rlouf added the question label Jul 10, 2023
@arunpatro
Copy link
Contributor Author

arunpatro commented Jul 10, 2023

Yes, this works. This is quite simple. Another requirement would be to sample multiple completions at once. I think this could be specified like:

"{{ gen 'code' samples=10 temperature=0.5}}" 
...
out = complete(...)
...
codes = out['code']

codes should be a list because of samples > 1. Currently guidance does this incorrectly because it does generations in streams and multiple streams will interleave and the final answer (a single string) will look gibberish.

Although this can be a tricky if there are many gen fields with samples > 1.

@rlouf
Copy link
Member

rlouf commented Jul 10, 2023

Should work with

out = complete(prompt, num_samples=10)

Then you just pass out to the next generator without specifying samples and it should be fine.

@arunpatro
Copy link
Contributor Author

Then you just pass out to the next generator without specifying samples and it should be fine.

This could work.

Just to clarify, this gen field construct is not yet supported right? It doesn't work rn.

@rlouf
Copy link
Member

rlouf commented Jul 10, 2023

Just to clarify, this gen field construct is not yet supported right? It doesn't work rn.

If you're asking whether outlines has an infilling DSL like guidance or LMQL have, no. It would be however easy to implement. I sketched it here and you just reminded me to open an issue to track progress on this.

Also, may I ask why you'd want a DSL like guidance's?

@arunpatro
Copy link
Contributor Author

I envision a prompt template to actually have fields that are user provided or generated, aka via gen fields. Running a infilling model on it to generate outputs, and then later access them via a dictionary output.

I do not want anything more than that, I do not want if-else-then block creation inside the prompt (hence called a program in guidance) to modify the prompt at will. We should perhaps implement all the prompt modifying business using regular python functions + jinja templating.

I did not find @text.infilling yet in the API, but if you are proposing an API like that, it should be good to start with.

@arunpatro
Copy link
Contributor Author

arunpatro commented Jul 10, 2023

In tree based searches, for example if we want to solve the problem and guarantee a structured evaluation process, we can use the template:

Use these numbers and basic arithmetic to get 24

Input: {{input}}
Steps:
{{gen 'step1'}}
{{gen 'step2'}}
{{gen 'step3'}}
Output: {{gen 'ans'}}

One way to get the outputs, is to simply run a next token prediction and complete fields until \n.

The only way to get multiple combinations of step1, step2, step3, ans is to sample each of the fields given the previous completed fields, which can take $n^4$ iterations (n = #samples) If there is a way to get all these n^4 solutions in a faster way by vectorizing, batching, kv-caching that would helpful in creating System 2 solutions.

@rlouf
Copy link
Member

rlouf commented Jul 11, 2023

From what I understand you would like three different things:

  1. KV caching across generations to improve performance. We're currently working on an implementation that we believe would be faster than guidance's. It is tracked by Implememt k-v cache for transformers models #150.
  2. The ability to generate a tree of generations by successively taking several samples for each output of the previous step. Afaik this is already doable with outlines. The original goal behind vectorisation was to allow the implementation of Tree of Thoughts.
  3. An infilling DSL (which was sketched in the link I sent you) for convenience. This is not necessary, your example can already be implemented in Outlines, but as stated above I'm open to adding that thin layer on top the existing capabilities.

In summary (2) looks like it's already doable (3) is convenient but non-blocking, and we should see together how we can prioritise (1). Did I get this right?

@arunpatro
Copy link
Contributor Author

arunpatro commented Jul 11, 2023

Yup you did. Indeed solving (1) will be most RoI. I think users would want an (3) infilling api, to me it seems natural for text-completion tasks, can you show how one can do it in outlines in a simple fashion?

Same for (2) - Will successive calling of model.complete(prompt) take care of the unfilled fields? I assume I need to do prompt concatenations whereever required. I am trying to gauge to complexity of implementing this in outlines.

@rlouf
Copy link
Member

rlouf commented Jul 12, 2023

Try the following. You can step through the program and look at the shape of the outputs, (10,), (10,10), (10,10,10). If I didn't understand correctly and you just need 10 samples for the whole sequence then remove samples=10 in the second, third and fourth call to the model.

import outlines.models as models
import outlines.text as text
import numpy as np

@text.prompt
def arithmetic_prompt(input):
    """Use these numbers and basic arithmetic to get 24

     Input: {{input}}
     Steps: """

add = np.char.add
model = models.text_completion.openai("text-davinci-003")

prompt = arithmetic_prompt("input")
step_1 = model(prompt, stop_at=["\n"], samples=10)

prompt = add(add(prompt, step_1), "\nStep 2: ")
step_2 = model(prompt, stop_at=["\n"], samples=10)

prompt = add(add(prompt, step_2), "\nStep 3: ")
step_3 = model(prompt, stop_at=["\n"], samples=10)

prompt = add(add(prompt, step_3), "\nOutput: ")
ans = model(prompt, samples=10)

@rlouf
Copy link
Member

rlouf commented Jul 12, 2023

I opened #182 to track the infilling DSL implementation. Feel free to take a stab at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants