Generate and score responses (LLM-as-judge) with API-based models #252

nouhadziri · 2024-08-13T16:13:25Z

This PR does two things:

API-mased model response generation: In addition to generating completions from HF and local models, the code now supports generating completions using API-based models (e.g., GPT-4). In a future PR I will address sampling responses from an ensemble of HF and API models.
Response scoring with API models (aka LLM-as-judge): We now use API-based models to score responses based on the given prompt and scoring criteria. Note that this approach is different from prompting the model to rank multiple responses simultaneously. We need to address this in a separate PR and compare the results.

I have added a file prompt_templates.py which contains two types of prompts: i) generation and ii) judgment. This file also supports different skills, which will allow us to experiment with prompts for each skill added to Tulu.

vwxyzjn

LGTM. Thanks @nouhadziri!

nouhadziri added 30 commits August 12, 2024 13:19

move rejection sampling within its own directory

9035816

move rejection sampling within its own directory

e97eb4e

add api-based models to generating completions

4bfc79b

debug

7a0abc4

debug

034259d

debug

b5990ef

debug

d44e1cb

debug

1c99e32

do not apply tokenization with chat template

6013160

change the tokenizer

c47a16e

change the tokenizer

91fd8b8

fix template

7093b10

fix arg issue

805fcac

add mode to arg

f93e738

add mode to arg

77d257b

change the skill

b2a4b83

remove main from api_generate.py

b80894a

modify template chat

fca6f6d

modify template chat

6541c9f

return only text

1630b4b

return only text

ebacf9e

return only text

45d14b3

fix dict

60a7a87

fix dict

33e8f8c

fix dict

9d99744

fix dict

7ce7419

fix dict

f3f1466

Merge branch 'main' into llm-as-judge

97b812e

fix dict

7fe90fd

add LM as a scorer

fe0cfe3

nouhadziri added 28 commits August 13, 2024 21:45

update readme

b84e848

remove unused args

9605a07

use args

fe5cb61

remove args

f009ed9

fix chatgpt completion

107b10d

fix chatgpt completion

c7849dd

fix chatgpt completion

2232ec2

fix bug

95fd9b3

fix format conversation

380b4d5

fix minor bug

02bfe70

fix minor bug

a62358d

fix minor bug

ce276a4

fix minor bug

61929fb

fix parsing response

5e3c577

test judgment

5d7c894

update format

b5b700a

update format

bc3e692

debug

4094812

debug

9de5dbf

debug

bbeb374

fix

56f2167

format fixed

f13ca49

format fixed

f714fe3

fix bug

ebd5d16

fix bug

3ea5083

fix bug

97ffe62

fix bug

c7ce3f4

add comment

ce9b158

vwxyzjn approved these changes Aug 14, 2024

View reviewed changes

vwxyzjn merged commit 7df9b6e into main Aug 14, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate and score responses (LLM-as-judge) with API-based models #252

Generate and score responses (LLM-as-judge) with API-based models #252

nouhadziri commented Aug 13, 2024

vwxyzjn left a comment

Generate and score responses (LLM-as-judge) with API-based models #252

Generate and score responses (LLM-as-judge) with API-based models #252

Conversation

nouhadziri commented Aug 13, 2024

vwxyzjn left a comment

Choose a reason for hiding this comment