Skip to content

Commit

Permalink
Revert "docs: sync Feb 21, 2024 (#2343)" (#2501)
Browse files Browse the repository at this point in the history
This reverts commit 4e151f3.
  • Loading branch information
mikeldking authored Mar 8, 2024
1 parent 4e151f3 commit 132d31b
Show file tree
Hide file tree
Showing 44 changed files with 627 additions and 492 deletions.
1 change: 0 additions & 1 deletion cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@
"numpy",
"openai",
"openinference",
"OTLP",
"postprocessors",
"pydantic",
"quickstart",
Expand Down
Binary file removed docs/.gitbook/assets/evals.png
Binary file not shown.
52 changes: 27 additions & 25 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,51 @@
---
description: Evaluate, troubleshoot, and fine-tune your LLM, CV, and NLP models.
cover: >-
https://images.unsplash.com/photo-1610296669228-602fa827fc1f?crop=entropy&cs=tinysrgb&fm=jpg&ixid=MnwxOTcwMjR8MHwxfHNlYXJjaHw1fHxzcGFjZXxlbnwwfHx8fDE2NzkwOTMzODc&ixlib=rb-4.0.3&q=80
coverY: 0
---

# Arize Phoenix
# Phoenix: AI Observability & Evaluation

Phoenix is an open-source observability library and platform designed for experimentation, evaluation, and troubleshooting.

The toolset is designed to ingest [inference data](quickstart/phoenix-inferences/inferences.md) for [LLMs](concepts/llm-observability.md), CV, NLP, and tabular datasets as well as [LLM traces](quickstart/llm-traces.md). It allows AI Engineers and Data Scientists to quickly visualize their data, evaluate performance, track down issues & insights, and easily export to improve. 

## Install Phoenix

In your Jupyter or Colab environment, run the following command to install.

{% tabs %}
{% tab title="Using pip" %}
```sh
pip install arize-phoenix
```
{% endtab %}

{% tab title="Using conda" %}
```sh
conda install -c conda-forge arize-phoenix
```
{% endtab %}
{% endtabs %}
{% embed url="https://www.loom.com/share/a96e244c4ff8473d9350b02ccbd203b4" %}
Overview of Phoenix Tracing
{% endembed %}

## Quickstarts

Running Phoenix for the first time? Select a quickstart below. 

<table data-card-size="large" data-view="cards"><thead><tr><th align="center"></th><th data-hidden data-card-target data-type="content-ref"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td align="center"><strong>Tracing</strong> </td><td><a href="quickstart/llm-traces.md">llm-traces.md</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.51.45 PM.png">Screenshot 2023-09-27 at 1.51.45 PM.png</a></td></tr><tr><td align="center"><strong>Evaluation</strong></td><td><a href="quickstart/evals.md">evals.md</a></td><td><a href=".gitbook/assets/evals.png">evals.png</a></td></tr><tr><td align="center"><strong>Inferences</strong></td><td><a href="quickstart/phoenix-inferences/">phoenix-inferences</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.53.06 PM.png">Screenshot 2023-09-27 at 1.53.06 PM.png</a></td></tr></tbody></table>
<table data-card-size="large" data-view="cards"><thead><tr><th align="center"></th><th data-hidden data-card-target data-type="content-ref"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td align="center"><strong>LLM Traces</strong></td><td><a href="quickstart/llm-traces.md">llm-traces.md</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.51.45 PM.png">Screenshot 2023-09-27 at 1.51.45 PM.png</a></td></tr><tr><td align="center"><strong>Inferences</strong></td><td><a href="quickstart/phoenix-inferences/inferences.md">inferences.md</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.53.06 PM.png">Screenshot 2023-09-27 at 1.53.06 PM.png</a></td></tr></tbody></table>

### Demo
Don't know which one to choose? Phoenix has two main data ingestion methods:

{% embed url="https://www.loom.com/share/a96e244c4ff8473d9350b02ccbd203b4" %}
Overview of Phoenix Tracing
{% endembed %}
1. [LLM Traces:](quickstart/llm-traces.md) Phoenix is used on top of trace data generated by LlamaIndex and LangChain. The general use case is to troubleshoot LLM applications with agentic workflows.&#x20;
2. [Inferences](quickstart/phoenix-inferences/inferences.md): Phoenix is used to troubleshoot models whose datasets can be expressed as DataFrames in Python such as LLM applications built in Python workflows, CV, NLP, and tabular models.

### **Phoenix Functionality**&#x20;

## Next Steps
* [**Evaluate Performance of LLM Tasks with Evals Library:**](llm-evals/llm-evals.md) Use the Phoenix Evals library to easily evaluate tasks such as hallucination, summarization, and retrieval relevance, or create your own custom template.
* [**Troubleshoot Agentic Workflows:**](concepts/llm-traces.md) Get visibility into where your complex or agentic workflow broke, or find performance bottlenecks, across different span types with LLM Tracing.
* [**Optimize Retrieval Systems:**](use-cases/troubleshooting-llm-retrieval-with-vector-stores.md) Identify missing context in your knowledge base, and when irrelevant context is retrieved by visualizing query embeddings alongside knowledge base embeddings with RAG Analysis.
* [**Compare Model Versions:**](https://docs.arize.com/phoenix/concepts/phoenix-basics/phoenix-basics#how-many-datasets-do-i-need) Compare and evaluate performance across model versions prior to deploying to production.
* [**Exploratory Data Analysis:**](integrations/bring-production-data-to-notebook-for-eda-or-retraining.md) Connect teams and workflows, with continued analysis of production data from Arize in a notebook environment for fine tuning workflows.
* [**Find Clusters of Issues to Export for Model Improvement:**](how-to/export-your-data.md) Find clusters of problems using performance metrics or drift. Export clusters for retraining workflows.
* [**Surface Model Drift and Multivariate Drift:**](https://docs.arize.com/phoenix/concepts/phoenix-basics/phoenix-basics#embedding-drift-over-time) Use the Embeddings Analyzer to surface data drift for computer vision, NLP, and tabular models.&#x20;

### [Try our Tutorials](notebooks.md)
## Resources

### [Tutorials](notebooks.md)

Check out a comprehensive list of example notebooks for LLM Traces, Evals, RAG Analysis, and more. &#x20;

### [Use Cases](broken-reference)

Learn about best practices, and how to get started with use case examples such as Q\&A with Retrieval, Summarization, and Chatbots.&#x20;

### [Community](https://join.slack.com/t/arize-ai/shared\_invite/zt-1ppbtg5dd-1CYmQO4dWF4zvXFiONTjMg)

Join the Phoenix Slack community to ask questions, share findings, provide feedback, and connect with other developers.&#x20;
Expand Down
54 changes: 29 additions & 25 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,28 @@
# Table of contents

* [Arize Phoenix](README.md)
* [User Guide](concepts/llm-observability.md)
* [Environments](environments.md)
* [Phoenix: AI Observability & Evaluation](README.md)
* [Examples](notebooks.md)
* [Installation](install-and-import-phoenix.md)
* [Environments](environments.md)

## 🔭 Tracing
## 🔑 Quickstart

* [Overview: Traces](concepts/llm-traces.md)
* [Quickstart: Traces](quickstart/llm-traces.md)
* [Instrumentation](telemetry/instrumentation.md)
* [OpenInference](concepts/open-inference.md)
* [Deployment](telemetry/deploying-phoenix.md)
* [Custom Spans](telemetry/custom-spans.md)
* [Phoenix Traces](quickstart/llm-traces.md)
* [Phoenix Evals](quickstart/evals.md)
* [Phoenix Inferences](quickstart/phoenix-inferences/README.md)
* [Schemas and Datasets](quickstart/phoenix-inferences/inferences.md)

## 💡 Concepts

* [LLM Observability](concepts/llm-observability.md)
* [Traces and Spans](concepts/llm-traces.md)
* [Evaluation](concepts/evaluation.md)
* [Generating Embeddings](concepts/generating-embeddings.md)
* [Embeddings Analysis](concepts/embeddings-analysis.md)

## 🧠 Evaluation
## 🧠 LLM Evals

* [Overview: Evals](llm-evals/llm-evals.md)
* [Concept: Evaluation](concepts/evaluation.md)
* [Quickstart: Evals](quickstart/evals.md)
* [Phoenix LLM Evals](llm-evals/llm-evals.md)
* [Running Pre-Tested Evals](llm-evals/running-pre-tested-evals/README.md)
* [Retrieval (RAG) Relevance](llm-evals/running-pre-tested-evals/retrieval-rag-relevance.md)
* [Hallucinations](llm-evals/running-pre-tested-evals/hallucinations.md)
Expand All @@ -33,14 +37,7 @@
* [Building Your Own Evals](llm-evals/building-your-own-evals.md)
* [Quickstart Retrieval Evals](llm-evals/quickstart-retrieval-evals/README.md)
* [Retrieval Evals on Document Chunks](llm-evals/quickstart-retrieval-evals/retrieval-evals-on-document-chunks.md)
* [Benchmarking Retrieval](llm-evals/benchmarking-retrieval-rag.md)

## 🌌 inferences

* [Quickstart: Inferences](quickstart/phoenix-inferences/README.md)
* [Schemas and Datasets](quickstart/phoenix-inferences/inferences.md)
* [Generating Embeddings](concepts/generating-embeddings.md)
* [Embeddings Analysis](concepts/embeddings-analysis.md)
* [Benchmarking Retrieval (RAG)](llm-evals/benchmarking-retrieval-rag.md)

## 🔮 Use Cases

Expand All @@ -60,7 +57,13 @@
* [Extract Data from Spans](how-to/extract-data-from-spans.md)
* [Use Example Datasets](how-to/use-example-datasets.md)

## ⌨️ API
## 🔭 telemetry

* [Deploying Phoenix](telemetry/deploying-phoenix.md)
* [Instrumentation](telemetry/instrumentation.md)
* [Custom Spans](telemetry/custom-spans.md)

## ⌨ API

* [Dataset and Schema](api/dataset-and-schema.md)
* [Session](api/session.md)
Expand All @@ -75,15 +78,16 @@
* [OpenAI](integrations/openai.md)
* [Bedrock](integrations/bedrock.md)
* [AutoGen](integrations/autogen-support.md)
* [DSPy](integrations/dspy.md)
* [Arize](integrations/bring-production-data-to-notebook-for-eda-or-retraining.md)

## 🏴‍☠️ Programming Languages
## 🏴 Programming Languages

* [JavaScript](programming-languages/javascript.md)

## 📚 Reference

* [Embeddings](concepts/embeddings.md)
* [OpenInference](concepts/open-inference.md)
* [Frequently Asked Questions](reference/frequently-asked-questions.md)
* [Contribute to Phoenix](reference/contribute-to-phoenix.md)

Expand Down
11 changes: 1 addition & 10 deletions docs/api/client.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,16 +63,7 @@ A client for making HTTP requests to the Phoenix server for extracting/downloadi

* **get\_trace\_dataset** -> Optional\[TraceDataset]\
\
Returns the trace dataset containing spans and evaluations.\

* **log\_evaluations** -> None\
\
Send evaluations to Phoenix. See [#logging-multiple-evaluation-dataframes](../how-to/define-your-schema/llm-evaluations.md#logging-multiple-evaluation-dataframes "mention")for usage.\


**Parameters**

* **\*evaluations** (Evaluations): One or more Evaluations datasets. See [llm-evaluations.md](../how-to/define-your-schema/llm-evaluations.md "mention")for more details.
Returns the trace dataset containing spans and evaluations.

### Usage

Expand Down
2 changes: 1 addition & 1 deletion docs/api/dataset-and-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ class EmbeddingColumnNames(
)
```

A dataclass that associates one or more columns of a dataframe with an [embedding](broken-reference) feature. Instances of this class are only used as values in a dictionary passed to the `embedding_feature_column_names` field of [Schema](dataset-and-schema.md#phoenix.schema).
A dataclass that associates one or more columns of a dataframe with an [embedding](../concepts/embeddings.md) feature. Instances of this class are only used as values in a dictionary passed to the `embedding_feature_column_names` field of [Schema](dataset-and-schema.md#phoenix.schema).

**\[**[**source**](https://github.com/Arize-ai/phoenix/blob/main/src/phoenix/datasets/schema.py)**]**

Expand Down
2 changes: 1 addition & 1 deletion docs/api/evals.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Evaluates a pandas dataframe using a set of user-specified evaluators that asses
* **provide\_explanation** (bool, optional): If true, each output dataframe will contain an explanation column containing the LLM's reasoning for each evaluation.
* **use\_function\_calling\_if\_available** (bool, optional): If true, function calling is used (if available) as a means to constrain the LLM outputs. With function calling, the LLM is instructed to provide its response as a structured JSON object, which is easier to parse.
* **verbose** (bool, optional): If true, prints detailed information such as model invocation parameters, retries on failed requests, etc.
* **concurrency** (int, optional): The number of concurrent workers if async submission is possible. If
* **concurrency** (int, optional): The number of concurrent workers if async submission is possible. If&#x20;

### Returns

Expand Down
10 changes: 0 additions & 10 deletions docs/api/evaluation-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,6 @@ class OpenAIModel:
"""How many completions to generate for each prompt."""
model_kwargs: Dict[str, Any] = field(default_factory=dict)
"""Holds any model parameters valid for `create` call not explicitly specified."""
batch_size: int = 20
"""Batch size to use when passing multiple documents to generate."""
request_timeout: Optional[Union[float, Tuple[float, float]]] = None
"""Timeout for requests to OpenAI completion API. Default is 600 seconds."""
max_retries: int = 20
Expand Down Expand Up @@ -80,14 +78,6 @@ model = OpenAIModel(
)
```

{% hint style="info" %}
Note that the `model_name` param is actually the `engine` of your deployment. You may get a `DeploymentNotFound` error if this parameter is not correct. You can find your engine param in the Azure OpenAI playground.\
\

{% endhint %}

<figure><img src="https://storage.googleapis.com/arize-assets/phoenix/assets/images/azure_openai_engine.png" alt=""><figcaption><p>How to find the model param in Azure</p></figcaption></figure>

Azure OpenAI supports specific options:

```python
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/embeddings-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ When two datasets are used to initialize phoenix, the clusters are automatically

### UMAP Point-Cloud

Phoenix projects the embeddings you provided into lower dimensional space (3 dimensions) using a dimension reduction algorithm called [UMAP](https://github.com/lmcinnes/umap) (stands for Uniform Manifold Approximation and Projection). This lets us understand how your [embeddings have encoded semantic meaning](broken-reference) in a visually understandable way.\
Phoenix projects the embeddings you provided into lower dimensional space (3 dimensions) using a dimension reduction algorithm called [UMAP](https://github.com/lmcinnes/umap) (stands for Uniform Manifold Approximation and Projection). This lets us understand how your [embeddings have encoded semantic meaning](embeddings.md) in a visually understandable way.\
\
In addition to the point-cloud, another dimension we have at our disposal is color (and in some cases shape). Out of the box phoenix let's you assign colors to the UMAP point-cloud by dimension (features, tags, predictions, actuals), performance (correctness which distinguishes true positives and true negatives from the incorrect predictions), and dataset (to highlight areas of drift). This helps you explore your point-cloud from different perspectives depending on what you are looking for.

Expand Down
Loading

0 comments on commit 132d31b

Please sign in to comment.