Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: sync Feb 21, 2024 #2343

Merged
merged 36 commits into from
Mar 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
f71b135
docs: custom spans
mikeldking Feb 7, 2024
030628e
Merge pull request #2227 from Arize-ai/custom-spans
mikeldking Feb 7, 2024
b07254e
docs: fix the semantic conventions link (GITBOOK-496)
mikeldking Feb 7, 2024
abb8e9e
docs: fix links to semantic conventions (GITBOOK-497)
mikeldking Feb 7, 2024
4e09fd1
docs: No subject (GITBOOK-498)
mikeldking Feb 7, 2024
0eaade5
docs: No subject (GITBOOK-499)
tammy37 Feb 7, 2024
1c079e4
docs: revert changes from #499 (GITBOOK-500)
tammy37 Feb 7, 2024
5e61e42
docs: update model args (GITBOOK-502)
mikeldking Feb 8, 2024
95fce76
docs: replace session with px.Client() (GITBOOK-504)
RogerHYang Feb 8, 2024
d5fad73
docs: migrate to otel instrumentors (GITBOOK-503)
RogerHYang Feb 8, 2024
f21544d
docs: Q&A addition (GITBOOK-373)
RogerHYang Feb 8, 2024
b3e2f38
docs: Azure guidance (GITBOOK-505)
mikeldking Feb 13, 2024
97a58a7
docs: Fixed df name (GITBOOK-507)
camyoung93 Feb 14, 2024
32b77ec
docs: llama-index 0.10 guidance (GITBOOK-508)
mikeldking Feb 15, 2024
69b51c9
docs: Moved up Telemtry (GITBOOK-509)
Feb 15, 2024
863ac94
docs: add dspy notebook link
axiomofjoy Feb 15, 2024
78d461d
docs: add dspy integration (GITBOOK-510)
axiomofjoy Feb 15, 2024
5d38907
docs: dspy tweaks (GITBOOK-511)
axiomofjoy Feb 15, 2024
91cd6e5
docs: Fix typo on "string" (GITBOOK-512)
camyoung93 Feb 15, 2024
111b153
docs: update markdownss for px.Client().log_evaluations() (#2313)
RogerHYang Feb 16, 2024
dbe0b6e
docs: add log_evaluations to Client (GITBOOK-513)
RogerHYang Feb 16, 2024
1d8ab0d
docs: No subject (GITBOOK-515)
mikeldking Feb 16, 2024
229457b
docs: fix .rst links to be .md (#2338)
mikeldking Feb 20, 2024
a88c717
Merge pull request #2344 from Arize-ai/main
mikeldking Feb 21, 2024
3232688
docs: Capitalizing Phoenix, removed duplicate sentence (GITBOOK-516)
camyoung93 Feb 21, 2024
26b9e9f
docs: Change Nav (GITBOOK-514)
Feb 23, 2024
63a76a3
docs: Overview revamp (GITBOOK-518)
mikeldking Feb 23, 2024
cbd2a5c
docs: remove use-cases (GITBOOK-519)
mikeldking Feb 23, 2024
7514211
docs: Simplify environments (GITBOOK-520)
mikeldking Feb 24, 2024
16eca63
docs: fix links (GITBOOK-522)
mikeldking Feb 24, 2024
37cc34e
docs: Add image (GITBOOK-523)
mikeldking Feb 24, 2024
c93e047
docs: Fixed typo "grammatical" (GITBOOK-524)
camyoung93 Feb 26, 2024
16c0de5
docs: delete obsolete exporter code (GITBOOK-525)
RogerHYang Feb 28, 2024
1064d21
Remove references to deprecated `processing` module in docs (#2422)
anticorrelator Mar 3, 2024
6759b67
docs: update install to include evals (GITBOOK-527)
mikeldking Mar 5, 2024
05ac8be
docs: fixed url typo (GITBOOK-528)
camyoung93 Mar 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
"numpy",
"openai",
"openinference",
"OTLP",
"postprocessors",
"pydantic",
"quickstart",
Expand Down
Binary file added docs/.gitbook/assets/evals.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 25 additions & 27 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,48 @@
---
description: Evaluate, troubleshoot, and fine-tune your LLM, CV, and NLP models.
cover: >-
https://images.unsplash.com/photo-1610296669228-602fa827fc1f?crop=entropy&cs=tinysrgb&fm=jpg&ixid=MnwxOTcwMjR8MHwxfHNlYXJjaHw1fHxzcGFjZXxlbnwwfHx8fDE2NzkwOTMzODc&ixlib=rb-4.0.3&q=80
coverY: 0
---

# Phoenix: AI Observability & Evaluation
# Arize Phoenix

Phoenix is an open-source observability library and platform designed for experimentation, evaluation, and troubleshooting.

The toolset is designed to ingest [inference data](quickstart/phoenix-inferences/inferences.md) for [LLMs](concepts/llm-observability.md), CV, NLP, and tabular datasets as well as [LLM traces](quickstart/llm-traces.md). It allows AI Engineers and Data Scientists to quickly visualize their data, evaluate performance, track down issues & insights, and easily export to improve. 

{% embed url="https://www.loom.com/share/a96e244c4ff8473d9350b02ccbd203b4" %}
Overview of Phoenix Tracing
{% endembed %}
## Install Phoenix

## Quickstarts
In your Jupyter or Colab environment, run the following command to install.

Running Phoenix for the first time? Select a quickstart below. 

<table data-card-size="large" data-view="cards"><thead><tr><th align="center"></th><th data-hidden data-card-target data-type="content-ref"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td align="center"><strong>LLM Traces</strong></td><td><a href="quickstart/llm-traces.md">llm-traces.md</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.51.45 PM.png">Screenshot 2023-09-27 at 1.51.45 PM.png</a></td></tr><tr><td align="center"><strong>Inferences</strong></td><td><a href="quickstart/phoenix-inferences/inferences.md">inferences.md</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.53.06 PM.png">Screenshot 2023-09-27 at 1.53.06 PM.png</a></td></tr></tbody></table>
{% tabs %}
{% tab title="Using pip" %}
```sh
pip install arize-phoenix
```
{% endtab %}

Don't know which one to choose? Phoenix has two main data ingestion methods:
{% tab title="Using conda" %}
```sh
conda install -c conda-forge arize-phoenix
```
{% endtab %}
{% endtabs %}

1. [LLM Traces:](quickstart/llm-traces.md) Phoenix is used on top of trace data generated by LlamaIndex and LangChain. The general use case is to troubleshoot LLM applications with agentic workflows.&#x20;
2. [Inferences](quickstart/phoenix-inferences/inferences.md): Phoenix is used to troubleshoot models whose datasets can be expressed as DataFrames in Python such as LLM applications built in Python workflows, CV, NLP, and tabular models.
## Quickstarts

### **Phoenix Functionality**&#x20;
Running Phoenix for the first time? Select a quickstart below.&#x20;

* [**Evaluate Performance of LLM Tasks with Evals Library:**](llm-evals/llm-evals.md) Use the Phoenix Evals library to easily evaluate tasks such as hallucination, summarization, and retrieval relevance, or create your own custom template.
* [**Troubleshoot Agentic Workflows:**](concepts/llm-traces.md) Get visibility into where your complex or agentic workflow broke, or find performance bottlenecks, across different span types with LLM Tracing.
* [**Optimize Retrieval Systems:**](use-cases/troubleshooting-llm-retrieval-with-vector-stores.md) Identify missing context in your knowledge base, and when irrelevant context is retrieved by visualizing query embeddings alongside knowledge base embeddings with RAG Analysis.
* [**Compare Model Versions:**](https://docs.arize.com/phoenix/concepts/phoenix-basics/phoenix-basics#how-many-datasets-do-i-need) Compare and evaluate performance across model versions prior to deploying to production.
* [**Exploratory Data Analysis:**](integrations/bring-production-data-to-notebook-for-eda-or-retraining.md) Connect teams and workflows, with continued analysis of production data from Arize in a notebook environment for fine tuning workflows.
* [**Find Clusters of Issues to Export for Model Improvement:**](how-to/export-your-data.md) Find clusters of problems using performance metrics or drift. Export clusters for retraining workflows.
* [**Surface Model Drift and Multivariate Drift:**](https://docs.arize.com/phoenix/concepts/phoenix-basics/phoenix-basics#embedding-drift-over-time) Use the Embeddings Analyzer to surface data drift for computer vision, NLP, and tabular models.&#x20;
<table data-card-size="large" data-view="cards"><thead><tr><th align="center"></th><th data-hidden data-card-target data-type="content-ref"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td align="center"><strong>Tracing</strong> </td><td><a href="quickstart/llm-traces.md">llm-traces.md</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.51.45 PM.png">Screenshot 2023-09-27 at 1.51.45 PM.png</a></td></tr><tr><td align="center"><strong>Evaluation</strong></td><td><a href="quickstart/evals.md">evals.md</a></td><td><a href=".gitbook/assets/evals.png">evals.png</a></td></tr><tr><td align="center"><strong>Inferences</strong></td><td><a href="quickstart/phoenix-inferences/">phoenix-inferences</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.53.06 PM.png">Screenshot 2023-09-27 at 1.53.06 PM.png</a></td></tr></tbody></table>

## Resources
### Demo

### [Tutorials](notebooks.md)
{% embed url="https://www.loom.com/share/a96e244c4ff8473d9350b02ccbd203b4" %}
Overview of Phoenix Tracing
{% endembed %}

Check out a comprehensive list of example notebooks for LLM Traces, Evals, RAG Analysis, and more. &#x20;
## Next Steps

### [Use Cases](broken-reference)
### [Try our Tutorials](notebooks.md)

Learn about best practices, and how to get started with use case examples such as Q\&A with Retrieval, Summarization, and Chatbots.&#x20;
Check out a comprehensive list of example notebooks for LLM Traces, Evals, RAG Analysis, and more. &#x20;

### [Community](https://join.slack.com/t/arize-ai/shared\_invite/zt-1ppbtg5dd-1CYmQO4dWF4zvXFiONTjMg)

Expand Down
54 changes: 25 additions & 29 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,24 @@
# Table of contents

* [Phoenix: AI Observability & Evaluation](README.md)
* [Examples](notebooks.md)
* [Installation](install-and-import-phoenix.md)
* [Arize Phoenix](README.md)
* [User Guide](concepts/llm-observability.md)
* [Environments](environments.md)
* [Examples](notebooks.md)

## 🔑 Quickstart

* [Phoenix Traces](quickstart/llm-traces.md)
* [Phoenix Evals](quickstart/evals.md)
* [Phoenix Inferences](quickstart/phoenix-inferences/README.md)
* [Schemas and Datasets](quickstart/phoenix-inferences/inferences.md)

## 💡 Concepts
## 🔭 Tracing

* [LLM Observability](concepts/llm-observability.md)
* [Traces and Spans](concepts/llm-traces.md)
* [Evaluation](concepts/evaluation.md)
* [Generating Embeddings](concepts/generating-embeddings.md)
* [Embeddings Analysis](concepts/embeddings-analysis.md)
* [Overview: Traces](concepts/llm-traces.md)
* [Quickstart: Traces](quickstart/llm-traces.md)
* [Instrumentation](telemetry/instrumentation.md)
* [OpenInference](concepts/open-inference.md)
* [Deployment](telemetry/deploying-phoenix.md)
* [Custom Spans](telemetry/custom-spans.md)

## 🧠 LLM Evals
## 🧠 Evaluation

* [Phoenix LLM Evals](llm-evals/llm-evals.md)
* [Overview: Evals](llm-evals/llm-evals.md)
* [Concept: Evaluation](concepts/evaluation.md)
* [Quickstart: Evals](quickstart/evals.md)
* [Running Pre-Tested Evals](llm-evals/running-pre-tested-evals/README.md)
* [Retrieval (RAG) Relevance](llm-evals/running-pre-tested-evals/retrieval-rag-relevance.md)
* [Hallucinations](llm-evals/running-pre-tested-evals/hallucinations.md)
Expand All @@ -37,7 +33,14 @@
* [Building Your Own Evals](llm-evals/building-your-own-evals.md)
* [Quickstart Retrieval Evals](llm-evals/quickstart-retrieval-evals/README.md)
* [Retrieval Evals on Document Chunks](llm-evals/quickstart-retrieval-evals/retrieval-evals-on-document-chunks.md)
* [Benchmarking Retrieval (RAG)](llm-evals/benchmarking-retrieval-rag.md)
* [Benchmarking Retrieval](llm-evals/benchmarking-retrieval-rag.md)

## 🌌 inferences

* [Quickstart: Inferences](quickstart/phoenix-inferences/README.md)
* [Schemas and Datasets](quickstart/phoenix-inferences/inferences.md)
* [Generating Embeddings](concepts/generating-embeddings.md)
* [Embeddings Analysis](concepts/embeddings-analysis.md)

## 🔮 Use Cases

Expand All @@ -57,13 +60,7 @@
* [Extract Data from Spans](how-to/extract-data-from-spans.md)
* [Use Example Datasets](how-to/use-example-datasets.md)

## 🔭 telemetry

* [Deploying Phoenix](telemetry/deploying-phoenix.md)
* [Instrumentation](telemetry/instrumentation.md)
* [Custom Spans](telemetry/custom-spans.md)

## ⌨ API
## ⌨️ API

* [Dataset and Schema](api/dataset-and-schema.md)
* [Session](api/session.md)
Expand All @@ -78,16 +75,15 @@
* [OpenAI](integrations/openai.md)
* [Bedrock](integrations/bedrock.md)
* [AutoGen](integrations/autogen-support.md)
* [DSPy](integrations/dspy.md)
* [Arize](integrations/bring-production-data-to-notebook-for-eda-or-retraining.md)

## 🏴 Programming Languages
## 🏴‍☠️ Programming Languages

* [JavaScript](programming-languages/javascript.md)

## 📚 Reference

* [Embeddings](concepts/embeddings.md)
* [OpenInference](concepts/open-inference.md)
* [Frequently Asked Questions](reference/frequently-asked-questions.md)
* [Contribute to Phoenix](reference/contribute-to-phoenix.md)

Expand Down
11 changes: 10 additions & 1 deletion docs/api/client.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,16 @@ A client for making HTTP requests to the Phoenix server for extracting/downloadi

* **get\_trace\_dataset** -> Optional\[TraceDataset]\
\
Returns the trace dataset containing spans and evaluations.
Returns the trace dataset containing spans and evaluations.\

* **log\_evaluations** -> None\
\
Send evaluations to Phoenix. See [#logging-multiple-evaluation-dataframes](../how-to/define-your-schema/llm-evaluations.md#logging-multiple-evaluation-dataframes "mention")for usage.\


**Parameters**

* **\*evaluations** (Evaluations): One or more Evaluations datasets. See [llm-evaluations.md](../how-to/define-your-schema/llm-evaluations.md "mention")for more details.

### Usage

Expand Down
2 changes: 1 addition & 1 deletion docs/api/dataset-and-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ class EmbeddingColumnNames(
)
```

A dataclass that associates one or more columns of a dataframe with an [embedding](../concepts/embeddings.md) feature. Instances of this class are only used as values in a dictionary passed to the `embedding_feature_column_names` field of [Schema](dataset-and-schema.md#phoenix.schema).
A dataclass that associates one or more columns of a dataframe with an [embedding](broken-reference) feature. Instances of this class are only used as values in a dictionary passed to the `embedding_feature_column_names` field of [Schema](dataset-and-schema.md#phoenix.schema).

**\[**[**source**](https://github.com/Arize-ai/phoenix/blob/main/src/phoenix/datasets/schema.py)**]**

Expand Down
2 changes: 1 addition & 1 deletion docs/api/evals.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Evaluates a pandas dataframe using a set of user-specified evaluators that asses
* **provide\_explanation** (bool, optional): If true, each output dataframe will contain an explanation column containing the LLM's reasoning for each evaluation.
* **use\_function\_calling\_if\_available** (bool, optional): If true, function calling is used (if available) as a means to constrain the LLM outputs. With function calling, the LLM is instructed to provide its response as a structured JSON object, which is easier to parse.
* **verbose** (bool, optional): If true, prints detailed information such as model invocation parameters, retries on failed requests, etc.
* **concurrency** (int, optional): The number of concurrent workers if async submission is possible. If&#x20;
* **concurrency** (int, optional): The number of concurrent workers if async submission is possible. If

### Returns

Expand Down
10 changes: 10 additions & 0 deletions docs/api/evaluation-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ class OpenAIModel:
"""How many completions to generate for each prompt."""
model_kwargs: Dict[str, Any] = field(default_factory=dict)
"""Holds any model parameters valid for `create` call not explicitly specified."""
batch_size: int = 20
"""Batch size to use when passing multiple documents to generate."""
request_timeout: Optional[Union[float, Tuple[float, float]]] = None
"""Timeout for requests to OpenAI completion API. Default is 600 seconds."""
max_retries: int = 20
Expand Down Expand Up @@ -78,6 +80,14 @@ model = OpenAIModel(
)
```

{% hint style="info" %}
Note that the `model_name` param is actually the `engine` of your deployment. You may get a `DeploymentNotFound` error if this parameter is not correct. You can find your engine param in the Azure OpenAI playground.\
\

{% endhint %}

<figure><img src="https://storage.googleapis.com/arize-assets/phoenix/assets/images/azure_openai_engine.png" alt=""><figcaption><p>How to find the model param in Azure</p></figcaption></figure>

Azure OpenAI supports specific options:

```python
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/embeddings-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ When two datasets are used to initialize phoenix, the clusters are automatically

### UMAP Point-Cloud

Phoenix projects the embeddings you provided into lower dimensional space (3 dimensions) using a dimension reduction algorithm called [UMAP](https://github.com/lmcinnes/umap) (stands for Uniform Manifold Approximation and Projection). This lets us understand how your [embeddings have encoded semantic meaning](embeddings.md) in a visually understandable way.\
Phoenix projects the embeddings you provided into lower dimensional space (3 dimensions) using a dimension reduction algorithm called [UMAP](https://github.com/lmcinnes/umap) (stands for Uniform Manifold Approximation and Projection). This lets us understand how your [embeddings have encoded semantic meaning](broken-reference) in a visually understandable way.\
\
In addition to the point-cloud, another dimension we have at our disposal is color (and in some cases shape). Out of the box phoenix let's you assign colors to the UMAP point-cloud by dimension (features, tags, predictions, actuals), performance (correctness which distinguishes true positives and true negatives from the incorrect predictions), and dataset (to highlight areas of drift). This helps you explore your point-cloud from different perspectives depending on what you are looking for.

Expand Down
Loading
Loading