Skip to content

Commit

Permalink
Merge branch 'main' into docs
Browse files Browse the repository at this point in the history
  • Loading branch information
mikeldking authored Feb 7, 2024
2 parents 3310ba6 + ee4ced3 commit d7d7f97
Show file tree
Hide file tree
Showing 71 changed files with 9,147 additions and 8,525 deletions.
1 change: 1 addition & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
docs
73 changes: 73 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,78 @@
# Changelog

## [2.9.4](https://github.com/Arize-ai/phoenix/compare/v2.9.3...v2.9.4) (2024-02-06)


### Bug Fixes

* disregard active session if endpoint is provided to px.Client ([#2206](https://github.com/Arize-ai/phoenix/issues/2206)) ([6ec0d23](https://github.com/Arize-ai/phoenix/commit/6ec0d2344ffb7f40534730160f10d99f266788da))

## [2.9.3](https://github.com/Arize-ai/phoenix/compare/v2.9.2...v2.9.3) (2024-02-05)


### Bug Fixes

* absolute path for eval exporter ([#2202](https://github.com/Arize-ai/phoenix/issues/2202)) ([2ac39e9](https://github.com/Arize-ai/phoenix/commit/2ac39e93de3f437c5cf3f092bd6de437d75337ce))

## [2.9.2](https://github.com/Arize-ai/phoenix/compare/v2.9.1...v2.9.2) (2024-02-05)


### Bug Fixes

* localhost address for px.Client ([#2200](https://github.com/Arize-ai/phoenix/issues/2200)) ([e56b66a](https://github.com/Arize-ai/phoenix/commit/e56b66adea734693a82f49b415e093a07a9f0ff1))

## [2.9.1](https://github.com/Arize-ai/phoenix/compare/v2.9.0...v2.9.1) (2024-02-05)


### Bug Fixes

* absolute path for urljoin in px.Client ([#2199](https://github.com/Arize-ai/phoenix/issues/2199)) ([ba30a30](https://github.com/Arize-ai/phoenix/commit/ba30a30d1312af042b81b631b5d0b6cc0e14d411))


### Documentation

* update readme with a deployment guide ([#2194](https://github.com/Arize-ai/phoenix/issues/2194)) ([bf67775](https://github.com/Arize-ai/phoenix/commit/bf6777569c764392d72d4ccf3c71738079957901))

## [2.9.0](https://github.com/Arize-ai/phoenix/compare/v2.8.0...v2.9.0) (2024-02-05)


### Features

* phoenix client `get_evaluations()` and `get_trace_dataset()` ([#2154](https://github.com/Arize-ai/phoenix/issues/2154)) ([29800e4](https://github.com/Arize-ai/phoenix/commit/29800e4ed4a901ad19874ba049638e13d8c67b87))
* phoenix client `get_spans_dataframe()` and `query_spans()` ([#2151](https://github.com/Arize-ai/phoenix/issues/2151)) ([e44b948](https://github.com/Arize-ai/phoenix/commit/e44b948301b28b22d5f578de686dc29c1cf84ad0))

## [2.8.0](https://github.com/Arize-ai/phoenix/compare/v2.7.0...v2.8.0) (2024-02-02)


### Features

* Remove model-level tenacity retries ([#2176](https://github.com/Arize-ai/phoenix/issues/2176)) ([66d452c](https://github.com/Arize-ai/phoenix/commit/66d452c45a676ee5dbac43b25df43df32bdb71bc))


### Bug Fixes

* broken link and openinference links ([#2144](https://github.com/Arize-ai/phoenix/issues/2144)) ([01fb046](https://github.com/Arize-ai/phoenix/commit/01fb0464d023e1494c22f80b10ed840eef47fce8))
* databricks check crashes in python console ([#2152](https://github.com/Arize-ai/phoenix/issues/2152)) ([5aeeeff](https://github.com/Arize-ai/phoenix/commit/5aeeeff9fa8c2d697374686552b35127238dce44))
* default collector endpoint breaks on windows ([#2161](https://github.com/Arize-ai/phoenix/issues/2161)) ([f1a2007](https://github.com/Arize-ai/phoenix/commit/f1a200713c44ffcf2506ff54429715ef7171ecd1))
* Do not retry when context window has been exceeded ([#2126](https://github.com/Arize-ai/phoenix/issues/2126)) ([ff6df1f](https://github.com/Arize-ai/phoenix/commit/ff6df1fc01f0986357a9e20e0441a3c15697a5fa))
* remove hyphens from span_id in legacy evaluation fixtures ([#2153](https://github.com/Arize-ai/phoenix/issues/2153)) ([fae859d](https://github.com/Arize-ai/phoenix/commit/fae859d8831669f92a368e979caa81a778948432))


### Documentation

* add docker badge ([e584ed8](https://github.com/Arize-ai/phoenix/commit/e584ed87960eba61c0e5165e3c0d08cf0d11e672))
* Add terminal running steps (GITBOOK-441) ([91c6b24](https://github.com/Arize-ai/phoenix/commit/91c6b24b411bd2d447c7c2c4453bb57320bff325))
* No subject (GITBOOK-442) ([5c4eb6c](https://github.com/Arize-ai/phoenix/commit/5c4eb6c93a284e06907582b3b80dc70cbfd3d0e6))
* No subject (GITBOOK-443) ([11f46cb](https://github.com/Arize-ai/phoenix/commit/11f46cbbb442dbbbc7d84779915ecc537461b80c))
* No subject (GITBOOK-444) ([fcf2bc9](https://github.com/Arize-ai/phoenix/commit/fcf2bc927c24cfb7cba3eda8e7589f59af2dfcf1))
* update badge ([ddcecea](https://github.com/Arize-ai/phoenix/commit/ddcecea23bc9998f361f3cb41427688f84314295))
* update prompt to reflect rails (GITBOOK-445) ([dea6dd6](https://github.com/Arize-ai/phoenix/commit/dea6dd6ce2f179cf200eaef5f77ba958140355a2))


### Miscellaneous Chores

* change release to 2.8.0 ([#2181](https://github.com/Arize-ai/phoenix/issues/2181)) ([0b7b524](https://github.com/Arize-ai/phoenix/commit/0b7b524d8cbd05bf1f8652a648145ed94d72af90))

## [2.7.0](https://github.com/Arize-ai/phoenix/compare/v2.6.0...v2.7.0) (2024-01-24)


Expand Down
55 changes: 40 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@
<a target="_blank" href="https://pypi.org/project/arize-phoenix/">
<img src="https://img.shields.io/pypi/pyversions/arize-phoenix">
</a>
<a target="_blank" href="https://hub.docker.com/repository/docker/arizephoenix/phoenix/general">
<img src="https://img.shields.io/docker/v/arizephoenix/phoenix?sort=semver&logo=docker&label=image&color=blue">
</a>
</p>

![a rotating UMAP point cloud of a computer vision model](https://github.com/Arize-ai/phoenix-assets/blob/main/gifs/image_classification_10mb.gif?raw=true)
Expand All @@ -36,21 +39,22 @@ Phoenix provides MLOps and LLMOps insights at lightning speed with zero-config o

**Table of Contents**

- [Installation](#installation)
- [LLM Traces](#llm-traces)
- [Tracing with LlamaIndex](#tracing-with-llamaindex)
- [Tracing with LangChain](#tracing-with-langchain)
- [LLM Evals](#llm-evals)
- [Embedding Analysis](#embedding-analysis)
- [UMAP-based Exploratory Data Analysis](#umap-based-exploratory-data-analysis)
- [Cluster-driven Drift and Performance Analysis](#cluster-driven-drift-and-performance-analysis)
- [Exportable Clusters](#exportable-clusters)
- [Retrieval-Augmented Generation Analysis](#retrieval-augmented-generation-analysis)
- [Structured Data Analysis](#structured-data-analysis)
- [Breaking Changes](#breaking-changes)
- [Community](#community)
- [Thanks](#thanks)
- [Copyright, Patent, and License](#copyright-patent-and-license)
- [Installation](#installation)
- [LLM Traces](#llm-traces)
- [Tracing with LlamaIndex](#tracing-with-llamaindex)
- [Tracing with LangChain](#tracing-with-langchain)
- [LLM Evals](#llm-evals)
- [Embedding Analysis](#embedding-analysis)
- [UMAP-based Exploratory Data Analysis](#umap-based-exploratory-data-analysis)
- [Cluster-driven Drift and Performance Analysis](#cluster-driven-drift-and-performance-analysis)
- [Exportable Clusters](#exportable-clusters)
- [Retrieval-Augmented Generation Analysis](#retrieval-augmented-generation-analysis)
- [Structured Data Analysis](#structured-data-analysis)
- [Deploying Phoenix](#deploying-phoenix)
- [Breaking Changes](#breaking-changes)
- [Community](#community)
- [Thanks](#thanks)
- [Copyright, Patent, and License](#copyright-patent-and-license)

## Installation

Expand Down Expand Up @@ -365,6 +369,27 @@ train_ds = px.Dataset(dataframe=train_df, schema=schema, name="training")
session = px.launch_app(primary=prod_ds, reference=train_ds)
```

## Deploying Phoenix

<a target="_blank" href="https://hub.docker.com/repository/docker/arizephoenix/phoenix/general">
<img src="https://img.shields.io/docker/v/arizephoenix/phoenix?sort=semver&logo=docker&label=image&color=blue">
</a>

<img src="https://storage.googleapis.com/arize-assets/phoenix/assets/images/deployment.png" title="How phoenix can collect traces from an LLM application"/>

Phoenix's notebook-first approach to observability makes it a great tool to utilize during experimentation and pre-production. However at some point you are going to want to ship your application to production and continue to monitor your application as it runs. Phoenix is made up of two components that can be deployed independently:

- **Trace Instrumentation**: These are a set of plugins that can be added to your application's startup process. These plugins (known as instrumentations) automatically collect spans for your application and export them for collection and visualization. For phoenix, all the instrumentors are managed via a single repository called [OpenInference](https://github.com/Arize-ai/openinference)
- **Trace Collector**: The Phoenix server acts as a trace collector and application that helps you troubleshoot your application in real time. You can pull the latest images of Phoenix from the [Docker Hub](https://hub.docker.com/repository/docker/arizephoenix/phoenix/general)

In order to run Phoenix tracing in production, you will have to follow these following steps:

- **Setup a Server**: your LLM application to run on a server ([examples](https://github.com/Arize-ai/openinference/tree/main/python/examples))
- **Instrument**: Add [OpenInference](https://github.com/Arize-ai/openinference) Instrumentation to your server
- **Observe**: Run the Phoenix server as a side-car or a standalone instance and point your tracing instrumentation to the phoenix server

For more information on deploying Phoenix, see the [Phoenix Deployment Guide](https://docs.arize.com/phoenix/deployment/deploying-phoenix).

## Breaking Changes

- **v1.0.0** - Phoenix now exclusively supports the `openai>=1.0.0` sdk. If you are using an older version of the OpenAI SDK, you can continue to use `arize-phoenix==0.1.1`. However, we recommend upgrading to the latest version of the OpenAI SDK as it contains many improvements. If you are using Phoenix with LlamaIndex and and LangChain, you will have to upgrade to the versions of these packages that support the OpenAI `1.0.0` SDK as well (`llama-index>=0.8.64`, `langchain>=0.0.334`)
Expand Down
2 changes: 1 addition & 1 deletion app/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@
"build:relay": "relay-compiler",
"watch": "./esbuild.config.mjs dev",
"test": "jest --config ./jest.config.js",
"dev": "npm run dev:server:image & npm run build:static && npm run watch",
"dev": "npm run dev:server:traces:llama_index_rag & npm run build:static && npm run watch",
"dev:server:mnist": "python3 -m phoenix.server.main --umap_params 0,30,550 fixture fashion_mnist",
"dev:server:mnist:single": "python3 -m phoenix.server.main fixture fashion_mnist --primary-only true",
"dev:server:sentiment": "python3 -m phoenix.server.main fixture sentiment_classification_language_drift",
Expand Down
2 changes: 1 addition & 1 deletion app/src/pages/trace/TracePage.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ export function TracePage() {
<DialogContainer
type="slideOver"
isDismissable
onDismiss={() => navigate(-1)}
onDismiss={() => navigate("/tracing")}
>
<Dialog size="XL" title="Trace Details">
<main
Expand Down
4 changes: 2 additions & 2 deletions docs/api/evals.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ from phoenix.experimental.evals import (
)

api_key = None # set your api key here or with the OPENAI_API_KEY environment variable
eval_model = OpenAIModel(model_name="gpt-4-1106-preview", api_key=api_key)
eval_model = OpenAIModel(model_name="gpt-4-turbo-preview", api_key=api_key)

hallucination_evaluator = HallucinationEvaluator(eval_model)
qa_correctness_evaluator = QAEvaluator(eval_model)
Expand Down Expand Up @@ -264,7 +264,7 @@ capitals_df = llm_generate(
dataframe=countries_df,
template=template,
model=OpenAIModel(
model_name="gpt-4-1106-preview",
model_name="gpt-4-turbo-preview",
model_kwargs={
"response_format": {"type": "json_object"}
}
Expand Down
6 changes: 3 additions & 3 deletions docs/llm-evals/quickstart-retrieval-evals/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ from phoenix.experimental.evals import (
# Creating Hallucination Eval which checks if the application hallucinated
hallucination_eval = llm_classify(
dataframe=queries_df,
model=OpenAIModel("gpt-4-1106-preview", temperature=0.0),
model=OpenAIModel("gpt-4-turbo-preview", temperature=0.0),
template=HALLUCINATION_PROMPT_TEMPLATE,
rails=list(HALLUCINATION_PROMPT_RAILS_MAP.values()),
provide_explanation=True, # Makes the LLM explain its reasoning
Expand All @@ -50,7 +50,7 @@ hallucination_eval["score"] = (
# Creating Q&A Eval which checks if the application answered the question correctly
qa_correctness_eval = llm_classify(
dataframe=queries_df,
model=OpenAIModel("gpt-4-1106-preview", temperature=0.0),
model=OpenAIModel("gpt-4-turbo-preview", temperature=0.0),
template=QA_PROMPT_TEMPLATE,
rails=list(QA_PROMPT_RAILS_MAP.values()),
provide_explanation=True, # Makes the LLM explain its reasoning
Expand Down Expand Up @@ -90,7 +90,7 @@ from phoenix.experimental.evals import (

retrieved_documents_eval = llm_classify(
dataframe=retrieved_documents_df,
model=OpenAIModel("gpt-4-1106-preview", temperature=0.0),
model=OpenAIModel("gpt-4-turbo-preview", temperature=0.0),
template=RAG_RELEVANCY_PROMPT_TEMPLATE,
rails=list(RAG_RELEVANCY_PROMPT_RAILS_MAP.values()),
provide_explanation=True,
Expand Down
2 changes: 1 addition & 1 deletion docs/quickstart/evals.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Install the OpenAI SDK with `pip install openai` and instantiate your model.
from phoenix.experimental.evals import OpenAIModel

api_key = None # set your api key here or with the OPENAI_API_KEY environment variable
eval_model = OpenAIModel(model_name="gpt-4-1106-preview", api_key=api_key)
eval_model = OpenAIModel(model_name="gpt-4-turbo-preview", api_key=api_key)
```

You'll next define your evaluators. Evaluators are built on top of language models and prompt the LLM to assess the quality of responses, the relevance of retrieved documents, etc., and provide a quality signal even in the absence of human-labeled data. Pick an evaluator type and instantiate it with the language model you want to use to perform evaluations using our battle-tested evaluation templates.
Expand Down
6 changes: 3 additions & 3 deletions docs/use-cases/rag-evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,7 @@ from phoenix.experimental.evals import (
run_evals,
)

relevance_evaluator = RelevanceEvaluator(OpenAIModel(model_name="gpt-4-1106-preview"))
relevance_evaluator = RelevanceEvaluator(OpenAIModel(model_name="gpt-4-turbo-preview"))

retrieved_documents_relevance_df = run_evals(
evaluators=[relevance_evaluator],
Expand Down Expand Up @@ -530,8 +530,8 @@ from phoenix.experimental.evals import (
run_evals,
)

qa_evaluator = QAEvaluator(OpenAIModel(model_name="gpt-4-1106-preview"))
hallucination_evaluator = HallucinationEvaluator(OpenAIModel(model_name="gpt-4-1106-preview"))
qa_evaluator = QAEvaluator(OpenAIModel(model_name="gpt-4-turbo-preview"))
hallucination_evaluator = HallucinationEvaluator(OpenAIModel(model_name="gpt-4-turbo-preview"))

qa_correctness_eval_df, hallucination_eval_df = run_evals(
evaluators=[qa_evaluator, hallucination_evaluator],
Expand Down
2 changes: 1 addition & 1 deletion examples/using_llamaindex_with_huggingface_models.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@
"metadata": {},
"outputs": [],
"source": [
"trace_df = px.active_session().get_spans_dataframe('span_kind == \"RETRIEVER\"')\n",
"trace_df = px.Client().get_spans_dataframe('span_kind == \"RETRIEVER\"')\n",
"trace_df"
]
},
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ dependencies = [
[tool.hatch.envs.type]
dependencies = [
"mypy==1.5.1",
"pydantic==v1.10.14", # for mypy
"llama-index>=0.9.14",
"pandas-stubs<=2.0.2.230605", # version 2.0.3.230814 is causing a dependency conflict.
"types-psutil",
Expand Down
2 changes: 2 additions & 0 deletions src/phoenix/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from .datasets.dataset import Dataset
from .datasets.fixtures import ExampleDatasets, load_example
from .datasets.schema import EmbeddingColumnNames, RetrievalEmbeddingColumnNames, Schema
from .session.client import Client
from .session.evaluation import log_evaluations
from .session.session import NotebookEnvironment, Session, active_session, close_app, launch_app
from .trace.fixtures import load_example_traces
Expand Down Expand Up @@ -39,4 +40,5 @@
"TraceDataset",
"NotebookEnvironment",
"log_evaluations",
"Client",
]
14 changes: 1 addition & 13 deletions src/phoenix/core/traces.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import weakref
from collections import defaultdict
from datetime import datetime, timezone
from enum import Enum
from queue import SimpleQueue
from threading import RLock, Thread
from types import MethodType
Expand Down Expand Up @@ -32,6 +31,7 @@
ATTRIBUTE_PREFIX,
COMPUTED_PREFIX,
CONTEXT_PREFIX,
ComputedAttributes,
Span,
SpanAttributes,
SpanID,
Expand All @@ -55,18 +55,6 @@
LLM_TOKEN_COUNT_COMPLETION = ATTRIBUTE_PREFIX + semantic_conventions.LLM_TOKEN_COUNT_COMPLETION


class ComputedAttributes(Enum):
# Enum value must be string prefixed by COMPUTED_PREFIX
LATENCY_MS = (
COMPUTED_PREFIX + "latency_ms"
) # The latency (or duration) of the span in milliseconds
CUMULATIVE_LLM_TOKEN_COUNT_TOTAL = COMPUTED_PREFIX + "cumulative_token_count.total"
CUMULATIVE_LLM_TOKEN_COUNT_PROMPT = COMPUTED_PREFIX + "cumulative_token_count.prompt"
CUMULATIVE_LLM_TOKEN_COUNT_COMPLETION = COMPUTED_PREFIX + "cumulative_token_count.completion"
ERROR_COUNT = COMPUTED_PREFIX + "error_count"
CUMULATIVE_ERROR_COUNT = COMPUTED_PREFIX + "cumulative_error_count"


class ReadableSpan(ObjectProxy): # type: ignore
"""
A wrapped a protobuf Span, with access methods and ability to decode to
Expand Down
24 changes: 8 additions & 16 deletions src/phoenix/experimental/evals/models/anthropic.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,6 @@ def __post_init__(self) -> None:
self._init_client()
self._init_tiktoken()
self._init_rate_limiter()
self.retry = self._retry(
error_types=[], # default to catching all errors
min_seconds=self.retry_min_seconds,
max_seconds=self.retry_max_seconds,
max_retries=self.max_retries,
)

def _init_environment(self) -> None:
try:
Expand Down Expand Up @@ -128,18 +122,17 @@ def _generate(self, prompt: str, **kwargs: Dict[str, Any]) -> str:
kwargs.pop("instruction", None)
invocation_parameters = self.invocation_parameters()
invocation_parameters.update(kwargs)
response = self._generate_with_retry(
response = self._rate_limited_completion(
model=self.model,
prompt=self._format_prompt_for_claude(prompt),
**invocation_parameters,
)

return str(response)

def _generate_with_retry(self, **kwargs: Any) -> Any:
@self.retry
def _rate_limited_completion(self, **kwargs: Any) -> Any:
@self._rate_limiter.limit
def _completion_with_retry(**kwargs: Any) -> Any:
def _completion(**kwargs: Any) -> Any:
try:
response = self.client.completions.create(**kwargs)
return response.completion
Expand All @@ -149,24 +142,23 @@ def _completion_with_retry(**kwargs: Any) -> Any:
raise PhoenixContextLimitExceeded(exception_message) from e
raise e

return _completion_with_retry(**kwargs)
return _completion(**kwargs)

async def _async_generate(self, prompt: str, **kwargs: Dict[str, Any]) -> str:
# instruction is an invalid input to Anthropic models, it is passed in by
# BaseEvalModel.__call__ and needs to be removed
kwargs.pop("instruction", None)
invocation_parameters = self.invocation_parameters()
invocation_parameters.update(kwargs)
response = await self._async_generate_with_retry(
response = await self._async_rate_limited_completion(
model=self.model, prompt=self._format_prompt_for_claude(prompt), **invocation_parameters
)

return str(response)

async def _async_generate_with_retry(self, **kwargs: Any) -> Any:
@self.retry
async def _async_rate_limited_completion(self, **kwargs: Any) -> Any:
@self._rate_limiter.alimit
async def _async_completion_with_retry(**kwargs: Any) -> Any:
async def _async_completion(**kwargs: Any) -> Any:
try:
response = await self.async_client.completions.create(**kwargs)
return response.completion
Expand All @@ -176,7 +168,7 @@ async def _async_completion_with_retry(**kwargs: Any) -> Any:
raise PhoenixContextLimitExceeded(exception_message) from e
raise e

return await _async_completion_with_retry(**kwargs)
return await _async_completion(**kwargs)

def _format_prompt_for_claude(self, prompt: str) -> str:
# Claude requires prompt in the format of Human: ... Assistant:
Expand Down
Loading

0 comments on commit d7d7f97

Please sign in to comment.