-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: add support for grouping inputs (#215)
- Loading branch information
1 parent
6f001be
commit 740a844
Showing
16 changed files
with
554 additions
and
46 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
"@empiricalrun/scorer": minor | ||
"@empiricalrun/types": minor | ||
"@empiricalrun/cli": minor | ||
"web": minor | ||
--- | ||
|
||
feat: add support for merging inputs and add multi-turn chat example |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
|
||
# Ignore outputs from Empirical | ||
.empiricalrun |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Evaluating multi-turn chat | ||
This example illustrates how to score outputs for multi-turn chat scenarios. | ||
|
||
### Dataset | ||
The dataset configured is configured in a [Google Sheet](https://docs.google.com/spreadsheets/d/1fZ_3FFj94SiucglQOTrCHQTZVhrWJwWqfSk-vEIp8_I/edit#gid=0). | ||
|
||
### Run Configuration | ||
The run is implemented using python script. The run configuration mentions `chat.py` as part of configuration. | ||
Essentially `chat.py` implements multi-turn conversation. | ||
|
||
### Scorer Configuration | ||
The scoring mechanism is implemented through a Python script named `score.py`. | ||
|
||
## Steps to run | ||
To execute the example: | ||
1. Install dependencies: | ||
``` | ||
poetry install | ||
``` | ||
1. Evaluate multi-turn chat using Empirical: | ||
``` | ||
npx @empiricalrun/cli run --python-path `poetry env info -e` | ||
``` | ||
>Note: Ensure `OPENAI_API_KEY` is exported before running above command. | ||
1. Visualize the output: | ||
``` | ||
npx @empiricalrun/cli ui | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
from openai import AsyncOpenAI | ||
|
||
|
||
async def execute(inputs, parameters): | ||
openai = AsyncOpenAI() | ||
messages = [] | ||
for input in inputs: | ||
input.get("user_query") | ||
messages.append({"role": "user", "content": input.get("user_query")}) | ||
chat_completion = await openai.chat.completions.create( | ||
messages=messages, | ||
model="gpt-3.5-turbo", | ||
) | ||
messages.append( | ||
{ | ||
"role": chat_completion.choices[0].message.role, | ||
"content": chat_completion.choices[0].message.content, | ||
} | ||
) | ||
openai.chat.completions.create | ||
thread_length = len(messages) | ||
return { | ||
# setting the last response as the final output of the conversation | ||
"value": messages[thread_length - 1].get("content", ""), | ||
# saving the thread in metadata for eyeball and scoring output | ||
"metadata": {"messages": messages}, | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{ | ||
"$schema": "https://assets.empirical.run/config/schema/latest.json", | ||
"runs": [ | ||
{ | ||
"type": "py-script", | ||
"path": "chat.py" | ||
} | ||
], | ||
"dataset": { | ||
"path": "https://docs.google.com/spreadsheets/d/1fZ_3FFj94SiucglQOTrCHQTZVhrWJwWqfSk-vEIp8_I/edit#gid=0", | ||
"group_by": "conv_id" | ||
}, | ||
"scorers": [ | ||
{ | ||
"name": "llm-evaluation", | ||
"type": "py-script", | ||
"path": "score.py" | ||
} | ||
] | ||
} |
Oops, something went wrong.