Skip to content

Commit

Permalink
docs: sync Feb 21, 2024 (#2343)
Browse files Browse the repository at this point in the history
* docs: custom spans

* docs: fix the semantic conventions link (GITBOOK-496)

* docs: fix links to semantic conventions (GITBOOK-497)

* docs: No subject (GITBOOK-498)

* docs: No subject (GITBOOK-499)

* docs: revert changes from #499 (GITBOOK-500)

* docs: update model args (GITBOOK-502)

* docs: replace session with px.Client() (GITBOOK-504)

* docs: migrate to otel instrumentors (GITBOOK-503)

* docs: Q&A addition (GITBOOK-373)

* docs: Azure guidance (GITBOOK-505)

* docs: Fixed df name (GITBOOK-507)

* docs: llama-index 0.10 guidance (GITBOOK-508)

* docs: Moved up Telemtry (GITBOOK-509)

* docs: add dspy notebook link

* docs: add dspy integration (GITBOOK-510)

* docs: dspy tweaks (GITBOOK-511)

* docs: Fix typo on "string" (GITBOOK-512)

* docs: update markdownss for px.Client().log_evaluations() (#2313)

* docs: add log_evaluations to Client (GITBOOK-513)

* docs: No subject (GITBOOK-515)

* docs: fix .rst links to be .md (#2338)

* docs: Capitalizing Phoenix, removed duplicate sentence (GITBOOK-516)

* docs: Change Nav (GITBOOK-514)

* docs: Overview revamp (GITBOOK-518)

* docs: remove use-cases (GITBOOK-519)

* docs: Simplify environments (GITBOOK-520)

* docs: fix links (GITBOOK-522)

* docs: Add image (GITBOOK-523)

* docs: Fixed typo "grammatical" (GITBOOK-524)

* docs: delete obsolete exporter code (GITBOOK-525)

* Remove references to deprecated `processing` module in docs (#2422)

* docs: update install to include evals (GITBOOK-527)

* docs: fixed url typo (GITBOOK-528)

---------

Co-authored-by: Tammy Le <tammy@arize.com>
Co-authored-by: Roger Yang <roger.yang@arize.com>
Co-authored-by: Cameron Young <cam@arize.com>
Co-authored-by: Jason Lopatecki <jason@arize.com>
Co-authored-by: Alexander Song <axiomofjoy@gmail.com>
Co-authored-by: Xander Song <xsong@arize.com>
Co-authored-by: Roger Yang <80478925+RogerHYang@users.noreply.github.com>
Co-authored-by: Aparna Dhinakaran <aparna@arize.com>
Co-authored-by: Dustin Ngo <dustin@arize.com>
  • Loading branch information
10 people authored Mar 8, 2024
1 parent 358194f commit 4e151f3
Show file tree
Hide file tree
Showing 44 changed files with 492 additions and 627 deletions.
1 change: 1 addition & 0 deletions cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
"numpy",
"openai",
"openinference",
"OTLP",
"postprocessors",
"pydantic",
"quickstart",
Expand Down
Binary file added docs/.gitbook/assets/evals.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 25 additions & 27 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,48 @@
---
description: Evaluate, troubleshoot, and fine-tune your LLM, CV, and NLP models.
cover: >-
https://images.unsplash.com/photo-1610296669228-602fa827fc1f?crop=entropy&cs=tinysrgb&fm=jpg&ixid=MnwxOTcwMjR8MHwxfHNlYXJjaHw1fHxzcGFjZXxlbnwwfHx8fDE2NzkwOTMzODc&ixlib=rb-4.0.3&q=80
coverY: 0
---

# Phoenix: AI Observability & Evaluation
# Arize Phoenix

Phoenix is an open-source observability library and platform designed for experimentation, evaluation, and troubleshooting.

The toolset is designed to ingest [inference data](quickstart/phoenix-inferences/inferences.md) for [LLMs](concepts/llm-observability.md), CV, NLP, and tabular datasets as well as [LLM traces](quickstart/llm-traces.md). It allows AI Engineers and Data Scientists to quickly visualize their data, evaluate performance, track down issues & insights, and easily export to improve.&#x20;

{% embed url="https://www.loom.com/share/a96e244c4ff8473d9350b02ccbd203b4" %}
Overview of Phoenix Tracing
{% endembed %}
## Install Phoenix

## Quickstarts
In your Jupyter or Colab environment, run the following command to install.

Running Phoenix for the first time? Select a quickstart below.&#x20;

<table data-card-size="large" data-view="cards"><thead><tr><th align="center"></th><th data-hidden data-card-target data-type="content-ref"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td align="center"><strong>LLM Traces</strong></td><td><a href="quickstart/llm-traces.md">llm-traces.md</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.51.45 PM.png">Screenshot 2023-09-27 at 1.51.45 PM.png</a></td></tr><tr><td align="center"><strong>Inferences</strong></td><td><a href="quickstart/phoenix-inferences/inferences.md">inferences.md</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.53.06 PM.png">Screenshot 2023-09-27 at 1.53.06 PM.png</a></td></tr></tbody></table>
{% tabs %}
{% tab title="Using pip" %}
```sh
pip install arize-phoenix
```
{% endtab %}

Don't know which one to choose? Phoenix has two main data ingestion methods:
{% tab title="Using conda" %}
```sh
conda install -c conda-forge arize-phoenix
```
{% endtab %}
{% endtabs %}

1. [LLM Traces:](quickstart/llm-traces.md) Phoenix is used on top of trace data generated by LlamaIndex and LangChain. The general use case is to troubleshoot LLM applications with agentic workflows.&#x20;
2. [Inferences](quickstart/phoenix-inferences/inferences.md): Phoenix is used to troubleshoot models whose datasets can be expressed as DataFrames in Python such as LLM applications built in Python workflows, CV, NLP, and tabular models.
## Quickstarts

### **Phoenix Functionality**&#x20;
Running Phoenix for the first time? Select a quickstart below.&#x20;

* [**Evaluate Performance of LLM Tasks with Evals Library:**](llm-evals/llm-evals.md) Use the Phoenix Evals library to easily evaluate tasks such as hallucination, summarization, and retrieval relevance, or create your own custom template.
* [**Troubleshoot Agentic Workflows:**](concepts/llm-traces.md) Get visibility into where your complex or agentic workflow broke, or find performance bottlenecks, across different span types with LLM Tracing.
* [**Optimize Retrieval Systems:**](use-cases/troubleshooting-llm-retrieval-with-vector-stores.md) Identify missing context in your knowledge base, and when irrelevant context is retrieved by visualizing query embeddings alongside knowledge base embeddings with RAG Analysis.
* [**Compare Model Versions:**](https://docs.arize.com/phoenix/concepts/phoenix-basics/phoenix-basics#how-many-datasets-do-i-need) Compare and evaluate performance across model versions prior to deploying to production.
* [**Exploratory Data Analysis:**](integrations/bring-production-data-to-notebook-for-eda-or-retraining.md) Connect teams and workflows, with continued analysis of production data from Arize in a notebook environment for fine tuning workflows.
* [**Find Clusters of Issues to Export for Model Improvement:**](how-to/export-your-data.md) Find clusters of problems using performance metrics or drift. Export clusters for retraining workflows.
* [**Surface Model Drift and Multivariate Drift:**](https://docs.arize.com/phoenix/concepts/phoenix-basics/phoenix-basics#embedding-drift-over-time) Use the Embeddings Analyzer to surface data drift for computer vision, NLP, and tabular models.&#x20;
<table data-card-size="large" data-view="cards"><thead><tr><th align="center"></th><th data-hidden data-card-target data-type="content-ref"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td align="center"><strong>Tracing</strong> </td><td><a href="quickstart/llm-traces.md">llm-traces.md</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.51.45 PM.png">Screenshot 2023-09-27 at 1.51.45 PM.png</a></td></tr><tr><td align="center"><strong>Evaluation</strong></td><td><a href="quickstart/evals.md">evals.md</a></td><td><a href=".gitbook/assets/evals.png">evals.png</a></td></tr><tr><td align="center"><strong>Inferences</strong></td><td><a href="quickstart/phoenix-inferences/">phoenix-inferences</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.53.06 PM.png">Screenshot 2023-09-27 at 1.53.06 PM.png</a></td></tr></tbody></table>

## Resources
### Demo

### [Tutorials](notebooks.md)
{% embed url="https://www.loom.com/share/a96e244c4ff8473d9350b02ccbd203b4" %}
Overview of Phoenix Tracing
{% endembed %}

Check out a comprehensive list of example notebooks for LLM Traces, Evals, RAG Analysis, and more. &#x20;
## Next Steps

### [Use Cases](broken-reference)
### [Try our Tutorials](notebooks.md)

Learn about best practices, and how to get started with use case examples such as Q\&A with Retrieval, Summarization, and Chatbots.&#x20;
Check out a comprehensive list of example notebooks for LLM Traces, Evals, RAG Analysis, and more. &#x20;

### [Community](https://join.slack.com/t/arize-ai/shared\_invite/zt-1ppbtg5dd-1CYmQO4dWF4zvXFiONTjMg)

Expand Down
54 changes: 25 additions & 29 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,24 @@
# Table of contents

* [Phoenix: AI Observability & Evaluation](README.md)
* [Examples](notebooks.md)
* [Installation](install-and-import-phoenix.md)
* [Arize Phoenix](README.md)
* [User Guide](concepts/llm-observability.md)
* [Environments](environments.md)
* [Examples](notebooks.md)

## 🔑 Quickstart

* [Phoenix Traces](quickstart/llm-traces.md)
* [Phoenix Evals](quickstart/evals.md)
* [Phoenix Inferences](quickstart/phoenix-inferences/README.md)
* [Schemas and Datasets](quickstart/phoenix-inferences/inferences.md)

## 💡 Concepts
## 🔭 Tracing

* [LLM Observability](concepts/llm-observability.md)
* [Traces and Spans](concepts/llm-traces.md)
* [Evaluation](concepts/evaluation.md)
* [Generating Embeddings](concepts/generating-embeddings.md)
* [Embeddings Analysis](concepts/embeddings-analysis.md)
* [Overview: Traces](concepts/llm-traces.md)
* [Quickstart: Traces](quickstart/llm-traces.md)
* [Instrumentation](telemetry/instrumentation.md)
* [OpenInference](concepts/open-inference.md)
* [Deployment](telemetry/deploying-phoenix.md)
* [Custom Spans](telemetry/custom-spans.md)

## 🧠 LLM Evals
## 🧠 Evaluation

* [Phoenix LLM Evals](llm-evals/llm-evals.md)
* [Overview: Evals](llm-evals/llm-evals.md)
* [Concept: Evaluation](concepts/evaluation.md)
* [Quickstart: Evals](quickstart/evals.md)
* [Running Pre-Tested Evals](llm-evals/running-pre-tested-evals/README.md)
* [Retrieval (RAG) Relevance](llm-evals/running-pre-tested-evals/retrieval-rag-relevance.md)
* [Hallucinations](llm-evals/running-pre-tested-evals/hallucinations.md)
Expand All @@ -37,7 +33,14 @@
* [Building Your Own Evals](llm-evals/building-your-own-evals.md)
* [Quickstart Retrieval Evals](llm-evals/quickstart-retrieval-evals/README.md)
* [Retrieval Evals on Document Chunks](llm-evals/quickstart-retrieval-evals/retrieval-evals-on-document-chunks.md)
* [Benchmarking Retrieval (RAG)](llm-evals/benchmarking-retrieval-rag.md)
* [Benchmarking Retrieval](llm-evals/benchmarking-retrieval-rag.md)

## 🌌 inferences

* [Quickstart: Inferences](quickstart/phoenix-inferences/README.md)
* [Schemas and Datasets](quickstart/phoenix-inferences/inferences.md)
* [Generating Embeddings](concepts/generating-embeddings.md)
* [Embeddings Analysis](concepts/embeddings-analysis.md)

## 🔮 Use Cases

Expand All @@ -57,13 +60,7 @@
* [Extract Data from Spans](how-to/extract-data-from-spans.md)
* [Use Example Datasets](how-to/use-example-datasets.md)

## 🔭 telemetry

* [Deploying Phoenix](telemetry/deploying-phoenix.md)
* [Instrumentation](telemetry/instrumentation.md)
* [Custom Spans](telemetry/custom-spans.md)

## ⌨ API
## ⌨️ API

* [Dataset and Schema](api/dataset-and-schema.md)
* [Session](api/session.md)
Expand All @@ -78,16 +75,15 @@
* [OpenAI](integrations/openai.md)
* [Bedrock](integrations/bedrock.md)
* [AutoGen](integrations/autogen-support.md)
* [DSPy](integrations/dspy.md)
* [Arize](integrations/bring-production-data-to-notebook-for-eda-or-retraining.md)

## 🏴 Programming Languages
## 🏴‍☠️ Programming Languages

* [JavaScript](programming-languages/javascript.md)

## 📚 Reference

* [Embeddings](concepts/embeddings.md)
* [OpenInference](concepts/open-inference.md)
* [Frequently Asked Questions](reference/frequently-asked-questions.md)
* [Contribute to Phoenix](reference/contribute-to-phoenix.md)

Expand Down
11 changes: 10 additions & 1 deletion docs/api/client.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,16 @@ A client for making HTTP requests to the Phoenix server for extracting/downloadi

* **get\_trace\_dataset** -> Optional\[TraceDataset]\
\
Returns the trace dataset containing spans and evaluations.
Returns the trace dataset containing spans and evaluations.\

* **log\_evaluations** -> None\
\
Send evaluations to Phoenix. See [#logging-multiple-evaluation-dataframes](../how-to/define-your-schema/llm-evaluations.md#logging-multiple-evaluation-dataframes "mention")for usage.\


**Parameters**

* **\*evaluations** (Evaluations): One or more Evaluations datasets. See [llm-evaluations.md](../how-to/define-your-schema/llm-evaluations.md "mention")for more details.

### Usage

Expand Down
2 changes: 1 addition & 1 deletion docs/api/dataset-and-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ class EmbeddingColumnNames(
)
```

A dataclass that associates one or more columns of a dataframe with an [embedding](../concepts/embeddings.md) feature. Instances of this class are only used as values in a dictionary passed to the `embedding_feature_column_names` field of [Schema](dataset-and-schema.md#phoenix.schema).
A dataclass that associates one or more columns of a dataframe with an [embedding](broken-reference) feature. Instances of this class are only used as values in a dictionary passed to the `embedding_feature_column_names` field of [Schema](dataset-and-schema.md#phoenix.schema).

**\[**[**source**](https://github.com/Arize-ai/phoenix/blob/main/src/phoenix/datasets/schema.py)**]**

Expand Down
2 changes: 1 addition & 1 deletion docs/api/evals.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Evaluates a pandas dataframe using a set of user-specified evaluators that asses
* **provide\_explanation** (bool, optional): If true, each output dataframe will contain an explanation column containing the LLM's reasoning for each evaluation.
* **use\_function\_calling\_if\_available** (bool, optional): If true, function calling is used (if available) as a means to constrain the LLM outputs. With function calling, the LLM is instructed to provide its response as a structured JSON object, which is easier to parse.
* **verbose** (bool, optional): If true, prints detailed information such as model invocation parameters, retries on failed requests, etc.
* **concurrency** (int, optional): The number of concurrent workers if async submission is possible. If&#x20;
* **concurrency** (int, optional): The number of concurrent workers if async submission is possible. If

### Returns

Expand Down
10 changes: 10 additions & 0 deletions docs/api/evaluation-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ class OpenAIModel:
"""How many completions to generate for each prompt."""
model_kwargs: Dict[str, Any] = field(default_factory=dict)
"""Holds any model parameters valid for `create` call not explicitly specified."""
batch_size: int = 20
"""Batch size to use when passing multiple documents to generate."""
request_timeout: Optional[Union[float, Tuple[float, float]]] = None
"""Timeout for requests to OpenAI completion API. Default is 600 seconds."""
max_retries: int = 20
Expand Down Expand Up @@ -78,6 +80,14 @@ model = OpenAIModel(
)
```

{% hint style="info" %}
Note that the `model_name` param is actually the `engine` of your deployment. You may get a `DeploymentNotFound` error if this parameter is not correct. You can find your engine param in the Azure OpenAI playground.\
\

{% endhint %}

<figure><img src="https://storage.googleapis.com/arize-assets/phoenix/assets/images/azure_openai_engine.png" alt=""><figcaption><p>How to find the model param in Azure</p></figcaption></figure>

Azure OpenAI supports specific options:

```python
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/embeddings-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ When two datasets are used to initialize phoenix, the clusters are automatically

### UMAP Point-Cloud

Phoenix projects the embeddings you provided into lower dimensional space (3 dimensions) using a dimension reduction algorithm called [UMAP](https://github.com/lmcinnes/umap) (stands for Uniform Manifold Approximation and Projection). This lets us understand how your [embeddings have encoded semantic meaning](embeddings.md) in a visually understandable way.\
Phoenix projects the embeddings you provided into lower dimensional space (3 dimensions) using a dimension reduction algorithm called [UMAP](https://github.com/lmcinnes/umap) (stands for Uniform Manifold Approximation and Projection). This lets us understand how your [embeddings have encoded semantic meaning](broken-reference) in a visually understandable way.\
\
In addition to the point-cloud, another dimension we have at our disposal is color (and in some cases shape). Out of the box phoenix let's you assign colors to the UMAP point-cloud by dimension (features, tags, predictions, actuals), performance (correctness which distinguishes true positives and true negatives from the incorrect predictions), and dataset (to highlight areas of drift). This helps you explore your point-cloud from different perspectives depending on what you are looking for.

Expand Down
Loading

0 comments on commit 4e151f3

Please sign in to comment.