[ENHANCEMENT] Use Phoenix to run Evals using Internal Hosted LLMs #2280

amank94 · 2024-02-14T17:11:19Z

Is your feature request related to a problem? Please describe.
My organization manages and runs our own LLMs and serves them internally with a custom API. We would like to use Phoenix to run Evals using our internal LLMs.

Describe the solution you'd like
I'd like a higher level interface or abstract class I can inherit from to implement a Phoenix BaseEvalModel when using my own internally-hosted LLM that is served via a REST API. I would like to implement only two methods: one to set up an httpx client configured to send requests to a specific endpoint and a second to parse the response to an LLM output string.

As a user, I can manage my own Auth and keys for the request

The resulting class should be a fully Phoenix-compatible BaseEvalModel that I can use with all evals features and built-in prompt templates.

Describe alternatives you've considered

LiteLLM provides the ability to wrap custom models as long as they are served on an OpenAI-compatible completions API. Not all of our internal models are served on the same endpoints, so we need something more general purpose.
The barrier to implementing our own version of a BaseEvalModel seems high, as it's hard to know if we've done so correctly in a way that will work with all Phoenix eval features. Since we host and spin up many different company models for many different purposes, we would like it to be easy to configure one for use with Phoenix at any time.

Example curl command below conforming to OpenAI completions endpoint

curl --location 'https://genai.<domain>.com/api/v1/genai/completions' \
--data '{
    "model": "mistralai/Mistral-7B-Instruct-v0.2",
    "prompt": "is it a holiday?",
    "presence_penalty": 1.15,
    "max_tokens": 200
}'

We may also pass additional headers i.e. ‘Context-Name’

An example implementation of a BaseEvalClass can be seen in the code example below

class CustomRequestModel(BaseEvalModel):
    temperature: float = 0.3
    max_tokens: int = 1024000
    top_p: float = 1
    top_k: int = 10

    def __post_init__(self) -> None:
        process = subprocess.run(['gcloud', 'auth', 'print-access-token'], capture_output=True, text=True)
        self._access_token = process.stdout.strip()
        self._headers = {
            "Authorization": f"Bearer {self._access_token}",
            "Content-Type": "application/json",
        }
        self.endpoint = "https://us-central1-aiplatform.googleapis.com/v1/projects/<id>/locations/us-central1/endpoints/<id>:predict"
        self.client = httpx.Client(headers=self._headers)
        self.async_client = httpx.AsyncClient(headers=self._headers)

    def _generate(self, prompt: str, **kwargs: Any) -> str:
        res = self.client.post(self.endpoint, json=self._format_request(prompt))
        # return res
        res.raise_for_status()
        completion = res.json()['predictions'][0]
        return completion.split("Output:")[-1]

    async def _async_generate(self, prompt: str, **kwargs: Any) -> str:
        res = await self.async_client.post(self.endpoint, json=self._format_request(prompt))
        # return res
        res.raise_for_status()
        completion = res.json()['predictions'][0]
        return completion.split("Output:")[-1]

    def _format_request(self, prompt: str) -> Dict[str, Any]:
        return {
            "instances": [
                {
                    "prompt": prompt,
                    "max_tokens": self.max_tokens,
                    "temperature": self.temperature,
                    "top_p": self.top_p,
                    "top_k": self.top_k,
                }
            ],
        }

    @property
    def encoder(self) -> "Encoding":
        ...

    @property
    def max_context_size(self) -> int:
        ...

    def get_text_from_tokens(self, tokens: List[int]) -> str:
        ...

    def get_tokens_from_text(self, text: str) -> List[int]:
        ...```

The text was updated successfully, but these errors were encountered:

dosubot · 2024-02-14T17:11:53Z

🤖

That sounds like a great enhancement request for the Arize Phoenix software! It's fantastic that you've already put thought into the solution and provided an example implementation of a BaseEvalClass. Your proposed solution for a higher level interface or abstract class to work with internally-hosted LLMs via a REST API seems well-considered. I encourage you to go ahead and create a pull request with your proposed changes to the repository. Your contribution will be valuable to the community and the project. Thank you for your initiative!

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

mikeldking · 2024-02-14T21:25:23Z

Discussed with @amank94 about a few different strategies:

if the API is OpenAI compatible (e.x. https://docs.anyscale.com/endpoints/model-serving/openai-migration-guide) - the user should just be able to use model_kwargs for now - this requires some testing. As a fallback we can suggest lightLLM
Alternatively we can give the user the ability to implement the model interface. For this to be forward compatible we first need to complete 🗺️ [Evals] Promote evals out of experimental #2142 - @anticorrelator is scoping out the minimum interface possible there.

mkhludnev · 2024-03-03T07:13:00Z

I think LiteLLMModel with model_kwargs or just with os.env is a way to go. If anyone would like to have #2423 extended with OpenAI API example I can add it.

) * demonstrate using selfhosted ollama as Eval Model. #2280 * openAI mock test * sweep * move, rename * imports

mikeldking · 2024-05-13T19:44:59Z

Marking this as resolved via lightLLM for now

amank94 added enhancement New feature or request triage issues that need triage labels Feb 14, 2024

github-project-automation bot added this to phoenix Feb 14, 2024

github-project-automation bot moved this to 📘 Todo in phoenix Feb 14, 2024

dosubot bot added c/dx Developer experience c/evals labels Feb 14, 2024

mikeldking removed the triage issues that need triage label Feb 14, 2024

mkhludnev added a commit to mkhludnev/phoenix that referenced this issue Mar 2, 2024

Arize-ai#2280 demonstrate using selfhosted ollama as Eval Model.

94b940d

mkhludnev mentioned this issue Mar 4, 2024

chore: demonstrate using self hosted Ollama & OpenAI as BaseModel #2423

Merged

mkhludnev added a commit to mkhludnev/phoenix that referenced this issue Mar 5, 2024

demonstrate using selfhosted ollama as Eval Model. Arize-ai#2280

6d54ad4

anticorrelator pushed a commit that referenced this issue Mar 7, 2024

chore: demonstrate using self hosted Ollama & OpenAI as BaseModel (#2423

257ac65

) * demonstrate using selfhosted ollama as Eval Model. #2280 * openAI mock test * sweep * move, rename * imports

mikeldking closed this as completed May 13, 2024

github-project-automation bot moved this from 📘 Todo to ✅ Done in phoenix May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENHANCEMENT] Use Phoenix to run Evals using Internal Hosted LLMs #2280

[ENHANCEMENT] Use Phoenix to run Evals using Internal Hosted LLMs #2280

amank94 commented Feb 14, 2024

dosubot bot commented Feb 14, 2024 •

edited

Loading

About Dosu

mikeldking commented Feb 14, 2024

mkhludnev commented Mar 3, 2024

mikeldking commented May 13, 2024

[ENHANCEMENT] Use Phoenix to run Evals using Internal Hosted LLMs #2280

[ENHANCEMENT] Use Phoenix to run Evals using Internal Hosted LLMs #2280

Comments

amank94 commented Feb 14, 2024

dosubot bot commented Feb 14, 2024 • edited Loading

About Dosu

mikeldking commented Feb 14, 2024

mkhludnev commented Mar 3, 2024

mikeldking commented May 13, 2024

dosubot bot commented Feb 14, 2024 •

edited

Loading