-
Notifications
You must be signed in to change notification settings - Fork 42
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
integrate deepeval metric with remote endpoint, like tgi server. (#168)
* integrate deepeval metric with remote endpoint, like tgi server.
- Loading branch information
1 parent
e11588c
commit ffa65dc
Showing
7 changed files
with
255 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
|
||
DeepEval is a simple-to-use, open-source LLM evaluation framework, for evaluating large-language model systems. It is similar to Pytest but specialized for unit testing LLM outputs. DeepEval incorporates the latest research to evaluate LLM outputs based on metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., which uses LLMs and various other NLP models that runs locally on your machine for evaluation. | ||
|
||
We customize models to support more local LLMs services for the evaluation of metrics such as hallucination, answer relevancy, etc. | ||
|
||
# 🚀 QuickStart | ||
|
||
|
||
## Installation | ||
|
||
``` | ||
pip install ../../../requirements.txt | ||
``` | ||
|
||
## Launch Service of LLM-as-a-Judge | ||
|
||
To setup a LLM model, we can use [tgi-gaudi](https://github.com/huggingface/tgi-gaudi) to launch a service. For example, the follow command is to setup the [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) model on 2 Gaudi2 cards: | ||
|
||
``` | ||
# please set your llm_port and hf_token | ||
docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} -e PREFILL_BATCH_BUCKET_SIZE=1 -e BATCH_BUCKET_SIZE=8 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.5 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2 --max-batch-total-tokens 65536 --max-batch-prefill-tokens 2048 | ||
``` | ||
|
||
## Writing your first test case | ||
|
||
```python | ||
import pytest | ||
from deepeval import assert_test | ||
from deepeval.metrics import AnswerRelevancyMetric | ||
from deepeval.test_case import LLMTestCase | ||
|
||
|
||
def test_case(): | ||
from evals.evaluation.deepeval.models.endpoint_models import TGIEndpointModel | ||
|
||
endpoint = TGIEndpointModel(model="http://localhost:{your_llm_port}/generate") | ||
|
||
answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5, model=endpoint) | ||
test_case = LLMTestCase( | ||
input="What if these shoes don't fit?", | ||
# Replace this with the actual output from your LLM application | ||
actual_output="We offer a 30-day full refund at no extra costs.", | ||
retrieval_context=["All customers are eligible for a 30 day full refund at no extra costs."], | ||
) | ||
assert_test(test_case, [answer_relevancy_metric]) | ||
``` | ||
|
||
## Acknowledgements | ||
|
||
The evaluation inherits from [deepeval](https://github.com/confident-ai/deepeval) repo. Thank for the founders of Confident AI. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
import abc | ||
import time | ||
from abc import ABC, abstractmethod | ||
from functools import cached_property | ||
from typing import List, Optional, Tuple, Union | ||
|
||
import requests | ||
from aiohttp import ClientSession, TCPConnector | ||
from deepeval.models.gpt_model import GPTModel | ||
from requests.exceptions import RequestException | ||
|
||
|
||
class TGIEndpointModel(GPTModel): | ||
def __init__(self, model: str, model_name: Optional[str] = None): | ||
model_name = "server-endpoint" if model_name is None else model_name | ||
super().__init__(model_name=model_name) | ||
|
||
self.model = model | ||
|
||
def _create_payload(self, prompt: str): | ||
return {"inputs": prompt, "parameters": {"do_sample": False}} | ||
|
||
@cached_property | ||
def header(self) -> dict: | ||
"""Override this property to return the headers for the API request.""" | ||
return {"Content-Type": "application/json"} | ||
|
||
def generate(self, prompt: str) -> Tuple[str, float]: | ||
|
||
try: | ||
start_time = time.perf_counter() | ||
res = requests.post( | ||
f"{self.model}", | ||
headers=self.header, | ||
json=self._create_payload(prompt), | ||
) | ||
res.raise_for_status() | ||
res = res.json() | ||
cost = time.perf_counter() - start_time | ||
except RequestException as e: | ||
raise Exception(f"An unexpected error occurred: {str(e)}") | ||
|
||
return res["generated_text"], cost | ||
|
||
def load_model(self, *args, **kwargs): | ||
"""Loads a model, that will be responsible for scoring. | ||
Returns: | ||
A model object | ||
""" | ||
pass | ||
|
||
async def a_generate(self, prompt: str) -> Tuple[str, float]: | ||
|
||
try: | ||
start_time = time.perf_counter() | ||
async with ClientSession() as session: | ||
async with session.post( | ||
f"{self.model}", | ||
headers=self.header, | ||
json=self._create_payload(prompt), | ||
) as response: | ||
if not response.ok: | ||
error_text = await response.text() | ||
print(f"API request failed with error message: {error_text}. Retrying...") | ||
|
||
response.raise_for_status() | ||
res = await response.json() | ||
cost = time.perf_counter() - start_time | ||
except RequestException as e: | ||
raise Exception(f"An unexpected error occurred: {str(e)}") | ||
|
||
return res["generated_text"], cost | ||
|
||
def get_model_name(self, *args, **kwargs) -> str: | ||
return "remote endpoint" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters