Skip to content

Commit

Permalink
fix initalize model
Browse files Browse the repository at this point in the history
  • Loading branch information
penguine-ip committed Jan 23, 2025
1 parent 97a6af3 commit 329f102
Show file tree
Hide file tree
Showing 3 changed files with 156 additions and 4 deletions.
2 changes: 1 addition & 1 deletion deepeval/metrics/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,7 @@ def initialize_model(
return model, False

# If the model is a string, we initialize a GPTModel and use as a native model
if isinstance(model, str):
if isinstance(model, str) or model is None:
return GPTModel(model=model), True

# Otherwise (the model is a wrong type), we raise an error
Expand Down
105 changes: 102 additions & 3 deletions docs/confident-ai/confident-ai-introduction.mdx
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
---
id: confident-ai-introduction
title: Confident AI Introduction
sidebar_label: Confident AI Introduction
title: Confident AI QuickStart
sidebar_label: Confident AI QuickStart
---

import Equation from "@site/src/components/equation";

:::caution
Without best LLM evaluation practices in place, your testing results aren't really valid, and you might be iterating back and fourth between the wrong things, which means your LLM application isn't nearly as performant as they should be.
Are you following best LLM evaluation practices? Without a serious evaluation workflow, your testing results aren't really valid, and you might be wasting a lot of time iterating on the wrong things.
:::

**Confident AI is the LLM evaluation platform for DeepEval**. It is native to DeepEval, and was designed for teams building LLM applications to maximize its performance, and to safeguard against unsatisfactory LLM outputs. Whilst DeepEval's open-source metrics are great for running evaluations, there is so much more to building a robust LLM evaluation workflow than collecting metric scores.
Expand Down Expand Up @@ -115,6 +115,25 @@ Confident AI solves all of your LLM evaluation problems so you can stop going ar
/>
</div>

## Installation

Go to the root directory of your project and create a virtual environment (if you don't already have one). In the CLI, run:

```python
python3 -m venv venv
source venv/bin/activate
```

In your newly created virtual environment, run:

```python
pip install -U deepeval
```

:::note
We always recommend keeping `deepeval` updated to its latest version to use Confident AI.
:::

## Login to Confident AI

Everything in `deepeval` is already automatically integrated with Confident AI, including any [custom metrics](/docs/metrics-custom) you've built on `deepeval`. To start using Confident AI with `deepeval`, simply login in the CLI:
Expand All @@ -139,3 +158,83 @@ deepeval login --confident-api-key "your-confident-api-key"
```

:::

## Run Your First Evaluation

Now that you're logged in, create a python file, for example say `experiment_llm.py`. We're going to be evaluating a medical chatbot for this quickstart guide, but it can be any other LLM systems that you are building.

<details><summary>Click to see fake data</summary>
<p>

```python
fake_data = [
{
"input": "I have a persistent cough and fever. Should I be worried?",
"actual_output": (
"Based on your symptoms, it could be a sign of a viral or bacterial infection. "
"However, if the fever persists for more than three days or you experience difficulty breathing, "
"please consult a doctor immediately."
),
"retrieval_context": [
"Coughing that lasts more than three weeks is typically classified as a chronic cough and could indicate conditions such as asthma, chronic bronchitis, or gastroesophageal reflux disease (GERD).",
"A fever is the body's natural response to infections, often caused by viruses or bacteria. Persistent fevers lasting more than three days should be evaluated by a healthcare professional as they may indicate conditions like pneumonia, tuberculosis, or sepsis.",
"Shortness of breath associated with fever and cough can be a sign of serious respiratory issues such as pneumonia, bronchitis, or COVID-19.",
"Self-care tips for mild symptoms include staying hydrated, taking over-the-counter fever reducers (e.g., acetaminophen or ibuprofen), and resting. Avoid suppressing a productive cough without consulting a healthcare provider."
]
},
{
"input": "What should I do if I accidentally cut my finger deeply?",
"actual_output": (
"If you cut your finger deeply, just rinse it with water and avoid applying any pressure. "
"Tetanus shots aren't necessary unless you see redness immediately."
),
"retrieval_context": [
"Deep cuts that are more than 0.25 inches deep or expose fat, muscle, or bone require immediate medical attention. Such wounds may need stitches to heal properly.",
"To minimize the risk of infection, wash the wound thoroughly with soap and water. Avoid using alcohol or hydrogen peroxide, as these can irritate the tissue and delay healing.",
"If the bleeding persists for more than 10 minutes or soaks through multiple layers of cloth or bandages, seek emergency care. Continuous bleeding might indicate damage to an artery or vein.",
"Watch for signs of infection, including redness, swelling, warmth, pain, or pus. Infections can develop even in small cuts if not properly cleaned or if the individual is at risk (e.g., diabetic or immunocompromised).",
"Tetanus, a bacterial infection caused by Clostridium tetani, can enter the body through open wounds. Ensure that your tetanus vaccination is up to date, especially if the wound was caused by a rusty or dirty object."
]
}
]

```

</p>
</details>

```python title="experiment_llm.py"
from deepeval import evaluate
from deepeval.test_case import LLMTestCase
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric

# See above for contents of fake data
fake_data = [...]

# Create a list of LLMTestCase
test_cases = []
for mock_test_case in mock_test_cases:
test_case = LLMTestCase(
input=mock_test_case["input"],
actual_output=mock_test_case["actual_output"],
retrieval_context=mock_test_case["retrieval_context"]
)
test_cases.append(test_case)

# Define metrics
answer_relevancy = AnswerRelevancyMetric()
faithfulness = FaithfulnessMetric()

# Run evaluation
evaluate(test_cases=test_cases, metrics=[answer_relevancy, faithfulness])
```

```console
python experiment_llm.py
```

And that's it! All you have to do is run `experiment_llm.py`, and Confident AI will automatically display you the results.

:::tip
If it's not displaying on Confident AI, it means you're not logged in. Run `deepeval login` again if that's the case.
:::
53 changes: 53 additions & 0 deletions g.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
from deepeval import evaluate
from deepeval.test_case import LLMTestCase
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric

# See above for contents of fake data
fake_data = [
{
"input": "I have a persistent cough and fever. Should I be worried?",
"actual_output": (
"Based on your symptoms, it could be a sign of a viral or bacterial infection. "
"However, if the fever persists for more than three days or you experience difficulty breathing, "
"please consult a doctor immediately."
),
"retrieval_context": [
"Coughing that lasts more than three weeks is typically classified as a chronic cough and could indicate conditions such as asthma, chronic bronchitis, or gastroesophageal reflux disease (GERD).",
"A fever is the body's natural response to infections, often caused by viruses or bacteria. Persistent fevers lasting more than three days should be evaluated by a healthcare professional as they may indicate conditions like pneumonia, tuberculosis, or sepsis.",
"Shortness of breath associated with fever and cough can be a sign of serious respiratory issues such as pneumonia, bronchitis, or COVID-19.",
"Self-care tips for mild symptoms include staying hydrated, taking over-the-counter fever reducers (e.g., acetaminophen or ibuprofen), and resting. Avoid suppressing a productive cough without consulting a healthcare provider.",
],
},
{
"input": "What should I do if I accidentally cut my finger deeply?",
"actual_output": (
"If you cut your finger deeply, just rinse it with water and avoid applying any pressure. "
"Tetanus shots aren't necessary unless you see redness immediately."
),
"retrieval_context": [
"Deep cuts that are more than 0.25 inches deep or expose fat, muscle, or bone require immediate medical attention. Such wounds may need stitches to heal properly.",
"To minimize the risk of infection, wash the wound thoroughly with soap and water. Avoid using alcohol or hydrogen peroxide, as these can irritate the tissue and delay healing.",
"If the bleeding persists for more than 10 minutes or soaks through multiple layers of cloth or bandages, seek emergency care. Continuous bleeding might indicate damage to an artery or vein.",
"Watch for signs of infection, including redness, swelling, warmth, pain, or pus. Infections can develop even in small cuts if not properly cleaned or if the individual is at risk (e.g., diabetic or immunocompromised).",
"Tetanus, a bacterial infection caused by Clostridium tetani, can enter the body through open wounds. Ensure that your tetanus vaccination is up to date, especially if the wound was caused by a rusty or dirty object.",
],
},
]


# Create a list of LLMTestCase
test_cases = []
for fake_datum in fake_data:
test_case = LLMTestCase(
input=fake_datum["input"],
actual_output=fake_datum["actual_output"],
retrieval_context=fake_datum["retrieval_context"],
)
test_cases.append(test_case)

# Define metrics
answer_relevancy = AnswerRelevancyMetric()
faithfulness = FaithfulnessMetric()

# Run evaluation
evaluate(test_cases=test_cases, metrics=[answer_relevancy, faithfulness])

0 comments on commit 329f102

Please sign in to comment.