-
Notifications
You must be signed in to change notification settings - Fork 388
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
20ff484
commit 714ad12
Showing
6 changed files
with
234 additions
and
53 deletions.
There are no files selected for viewing
38 changes: 38 additions & 0 deletions
38
docs/confident-ai/confident-ai-advanced-evaluation-model.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
--- | ||
id: confident-ai-advanced-evaluation-model | ||
title: Setup Evaluation Model | ||
sidebar_label: Setup Evaluation Model | ||
--- | ||
|
||
If you choose to run evaluations on Confident AI, you'll need to specify which evaluation model to use when running LLM-as-a-judge metrics. You'll need to configure an evaluation model if you're using any of these features: | ||
|
||
- [Online evaluations](/confident-ai/confident-ai-llm-monitoring-evaluations) | ||
- [Experiments](/confident-ai/confident-ai-testing-n-evaluation-experiments) | ||
|
||
To setup an evaluation model, you can either: | ||
|
||
1. Use Confident AI's default models (on additional setup required). | ||
2. Use OpenAI by supplying your own OpenAI api key. | ||
3. Use a custom endpoint as your evaluation model. | ||
|
||
### Using Custom LLMs | ||
|
||
To use a custom LLM, go to **Project Settings** > **Evaluation Model**, and select the "Custom" option under the model provider dropdown. You'll have the opportunity to supply a model "Name", as well as an "Endpoint. The name is straightforward, since it is merely for identification purposes, but for the model endpoint, you'll have to ensure it follows the following rules: | ||
|
||
1. It **MUST** accept a `POST` request over HTTPS, and must be reachable over the internet. | ||
2. It **MUST** return a `200` status where **the response body is a simple string** (not Json, not nothing else, but string only) that represents the generated text of your evaluation model based on the given request body (which contains the `input`). | ||
3. It **MUST** correctly parse the request body to extract the `input` and use your evaluation model to generate a response based on this `input`. | ||
|
||
Here is what the structure of the request body looks like: | ||
|
||
```python | ||
{ | ||
"input": "Given the metric score, tell me why..." | ||
} | ||
``` | ||
|
||
Please test your endpoint that it meets all the given requirements before continuing. This might be harder to debug than other steps, so you should aim to log any errors and reach out to support@confident-ai.com for any errors during setup. | ||
|
||
:::tip | ||
If you find a model provider that is not support, most of the time it is because no one has asked for it. Simply email support@confident-ai.com as well if that is the case. | ||
::: |
73 changes: 73 additions & 0 deletions
73
docs/confident-ai/confident-ai-advanced-llm-connection.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
--- | ||
id: confident-ai-advanced-llm-connection | ||
title: Setup LLM Endpoint Connection | ||
sidebar_label: Setup LLM Endpoint Connection | ||
--- | ||
|
||
:::tip | ||
This is particularly helpful if you wish to enable a no-code evaluation workflow for non-technical users, or simulating conversations to be evaluated in one click of a button. | ||
::: | ||
|
||
You can also setup an LLM endpoint that accepts a `POST` request over HTTPS to **enable users to run evaluations directly on the platform without having to code**, and start an evaluation through a click of a button instead. At a high level, you would have to provide Confident AI with the mappings to test case parameters such as the `actual_output`, `retrieval_context`, etc., and at evaluation time Confident AI will use the dataset and metrics settings you've specified for your experiment to unit test your LLM application. | ||
|
||
### Create an LLM Endpoint | ||
|
||
In order for Confident AI to reach your LLM application, you'll need to expose your LLM in a RESTFUL API endpoint that is accessible over the internet. These are the hard rules you **MUST** follow when setting up your endpoint: | ||
|
||
1. Accepts a POST request over HTTPS. | ||
2. Returns a JSON response and **MUST** contain the `actual_output` value somewhere in the returned Json object. Whether or not to supply a `retrieval_context` or `tools_called` value in your returned Json is optional, and this depends on whether the metrics you have enabled for your experiment requires these parameters. | ||
|
||
:::caution | ||
When Confident AI calls your LLM endpoint, it does a POST request with a data structure of this type: | ||
|
||
```python | ||
{ | ||
"input": "..." | ||
} | ||
``` | ||
|
||
This input will be used to unit test your LLM application, and any JSON response returned will be parsed and used to deduce what the remaining test case parameters (i.e. `actual_output`) values are. | ||
|
||
So, it is **imperative that your LLM endpoint**: | ||
|
||
1. Parses the incoming data to extract this `input` value to carry out generations. | ||
2. Returns the `actual_output` and any other `LLMTestCase` parameters in the JSON response with their **correct respective type**. | ||
|
||
For those that want a recap of what types each test case parameter is of, visit the [test cases section.](/docs/evaluation-test-cases) | ||
|
||
::: | ||
|
||
### Connect Your LLM Endpoint | ||
|
||
Now that you have your endpoint up and running, all that's left is to tell Confident AI how to reach it. | ||
|
||
:::tip | ||
You can setup your LLM connection in **Project Settings** > **LLM Connection**. There is also a button for you to **ping your LLM endpoint** to sanity check that you've setup everything correctly. | ||
::: | ||
|
||
You'll have to provide the: | ||
|
||
1. HTTPS endpoint you've setup. | ||
2. The json key path to the mandaatory `actual_output` parameter, and the optional `retrieval_context` , and `tools_called` parameters. The json key path is a list of strings. | ||
|
||
In order for evaluation to work, you **MUST** set the json key path for the `actual_output` parameter. Remember, the `actual_output` of your LLM application is always required for evaluation, while the `retrieval_context` and `tools_called` parameters are optional depending on the metrics you've enabled. | ||
|
||
:::note | ||
The json key path tells Confident AI where to look in your Json response for the respective test case parameter values. | ||
::: | ||
|
||
For instance, if you set the key path of the `actual_output` parameter to `["response", "actual output"]`, the correct Json response to return from your LLM endpoint is as follows: | ||
|
||
```python | ||
{ | ||
"response": { | ||
"actual output": "must be a string!" | ||
} | ||
} | ||
``` | ||
|
||
That's not to say you can't include other things to return in your Json response, but the key path will determine the variables Confident AI will be using to populate `LLMTestCase`s at evaluation time. | ||
|
||
:::info | ||
If you're wondering why `expected_output`, `context`, and `expected_tools` is not required in setting up your json key path, it is because it is expected that these variables are static, just like the `input`, and should therefore come from your dataset instead. | ||
::: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters