Arize-ai · mikeldking · Mar 8, 2024 · Feb 7, 2024 · Feb 7, 2024 · Feb 7, 2024
diff --git a/cspell.json b/cspell.json
@@ -24,6 +24,7 @@
         "numpy",
         "openai",
         "openinference",
+        "OTLP",
         "postprocessors",
         "pydantic",
         "quickstart",

diff --git a/docs/.gitbook/assets/evals.png b/docs/.gitbook/assets/evals.png
diff --git a/docs/README.md b/docs/README.md
@@ -1,50 +1,48 @@
 ---
 description: Evaluate, troubleshoot, and fine-tune your LLM, CV, and NLP models.
-cover: >-
-  https://images.unsplash.com/photo-1610296669228-602fa827fc1f?crop=entropy&cs=tinysrgb&fm=jpg&ixid=MnwxOTcwMjR8MHwxfHNlYXJjaHw1fHxzcGFjZXxlbnwwfHx8fDE2NzkwOTMzODc&ixlib=rb-4.0.3&q=80
-coverY: 0
 ---
 
-# Phoenix: AI Observability & Evaluation
+# Arize Phoenix
 
 Phoenix is an open-source observability library and platform designed for experimentation, evaluation, and troubleshooting.
 
 The toolset is designed to ingest [inference data](quickstart/phoenix-inferences/inferences.md) for [LLMs](concepts/llm-observability.md), CV, NLP, and tabular datasets as well as [LLM traces](quickstart/llm-traces.md). It allows AI Engineers and Data Scientists to quickly visualize their data, evaluate performance, track down issues & insights, and easily export to improve.&#x20;
 
-{% embed url="https://www.loom.com/share/a96e244c4ff8473d9350b02ccbd203b4" %}
-Overview of Phoenix Tracing
-{% endembed %}
+## Install Phoenix
 
-## Quickstarts
+In your Jupyter or Colab environment, run the following command to install.
 
-Running Phoenix for the first time? Select a quickstart below.&#x20;
-
-<table data-card-size="large" data-view="cards"><thead><tr><th align="center"></th><th data-hidden data-card-target data-type="content-ref"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td align="center"><strong>LLM Traces</strong></td><td><a href="quickstart/llm-traces.md">llm-traces.md</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.51.45 PM.png">Screenshot 2023-09-27 at 1.51.45 PM.png</a></td></tr><tr><td align="center"><strong>Inferences</strong></td><td><a href="quickstart/phoenix-inferences/inferences.md">inferences.md</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.53.06 PM.png">Screenshot 2023-09-27 at 1.53.06 PM.png</a></td></tr></tbody></table>
+{% tabs %}
+{% tab title="Using pip" %}
+```sh
+pip install arize-phoenix
+```
+{% endtab %}
 
-Don't know which one to choose? Phoenix has two main data ingestion methods:
+{% tab title="Using conda" %}
+```sh
+conda install -c conda-forge arize-phoenix
+```
+{% endtab %}
+{% endtabs %}
 
-1. [LLM Traces:](quickstart/llm-traces.md) Phoenix is used on top of trace data generated by LlamaIndex and LangChain. The general use case is to troubleshoot LLM applications with agentic workflows.&#x20;
-2. [Inferences](quickstart/phoenix-inferences/inferences.md): Phoenix is used to troubleshoot models whose datasets can be expressed as DataFrames in Python such as LLM applications built in Python workflows, CV, NLP, and tabular models.
+## Quickstarts
 
-### **Phoenix Functionality**&#x20;
+Running Phoenix for the first time? Select a quickstart below.&#x20;
 
-* [**Evaluate Performance of LLM Tasks with Evals Library:**](llm-evals/llm-evals.md) Use the Phoenix Evals library to easily evaluate tasks such as hallucination, summarization, and retrieval relevance, or create your own custom template.
-* [**Troubleshoot Agentic Workflows:**](concepts/llm-traces.md) Get visibility into where your complex or agentic workflow broke, or find performance bottlenecks, across different span types with LLM Tracing.
-* [**Optimize Retrieval Systems:**](use-cases/troubleshooting-llm-retrieval-with-vector-stores.md) Identify missing context in your knowledge base, and when irrelevant context is retrieved by visualizing query embeddings alongside knowledge base embeddings with RAG Analysis.
-* [**Compare Model Versions:**](https://docs.arize.com/phoenix/concepts/phoenix-basics/phoenix-basics#how-many-datasets-do-i-need) Compare and evaluate performance across model versions prior to deploying to production.
-* [**Exploratory Data Analysis:**](integrations/bring-production-data-to-notebook-for-eda-or-retraining.md) Connect teams and workflows, with continued analysis of production data from Arize in a notebook environment for fine tuning workflows.
-* [**Find Clusters of Issues to Export for Model Improvement:**](how-to/export-your-data.md) Find clusters of problems using performance metrics or drift. Export clusters for retraining workflows.
-* [**Surface Model Drift and Multivariate Drift:**](https://docs.arize.com/phoenix/concepts/phoenix-basics/phoenix-basics#embedding-drift-over-time) Use the Embeddings Analyzer to surface data drift for computer vision, NLP, and tabular models.&#x20;
+<table data-card-size="large" data-view="cards"><thead><tr><th align="center"></th><th data-hidden data-card-target data-type="content-ref"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td align="center"><strong>Tracing</strong> </td><td><a href="quickstart/llm-traces.md">llm-traces.md</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.51.45 PM.png">Screenshot 2023-09-27 at 1.51.45 PM.png</a></td></tr><tr><td align="center"><strong>Evaluation</strong></td><td><a href="quickstart/evals.md">evals.md</a></td><td><a href=".gitbook/assets/evals.png">evals.png</a></td></tr><tr><td align="center"><strong>Inferences</strong></td><td><a href="quickstart/phoenix-inferences/">phoenix-inferences</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.53.06 PM.png">Screenshot 2023-09-27 at 1.53.06 PM.png</a></td></tr></tbody></table>
 
-## Resources
+### Demo
 
-### [Tutorials](notebooks.md)
+{% embed url="https://www.loom.com/share/a96e244c4ff8473d9350b02ccbd203b4" %}
+Overview of Phoenix Tracing
+{% endembed %}
 
-Check out a comprehensive list of example notebooks for LLM Traces, Evals, RAG Analysis, and more. &#x20;
+## Next Steps
 
-### [Use Cases](broken-reference)
+### [Try our Tutorials](notebooks.md)
 
-Learn about best practices, and how to get started with use case examples such as Q\&A with Retrieval, Summarization, and Chatbots.&#x20;
+Check out a comprehensive list of example notebooks for LLM Traces, Evals, RAG Analysis, and more. &#x20;
 
 ### [Community](https://join.slack.com/t/arize-ai/shared\_invite/zt-1ppbtg5dd-1CYmQO4dWF4zvXFiONTjMg)
 

diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
@@ -1,28 +1,24 @@
 # Table of contents
 
-* [Phoenix: AI Observability & Evaluation](README.md)
-* [Examples](notebooks.md)
-* [Installation](install-and-import-phoenix.md)
+* [Arize Phoenix](README.md)
+* [User Guide](concepts/llm-observability.md)
 * [Environments](environments.md)
+* [Examples](notebooks.md)
 
-## 🔑 Quickstart
-
-* [Phoenix Traces](quickstart/llm-traces.md)
-* [Phoenix Evals](quickstart/evals.md)
-* [Phoenix Inferences](quickstart/phoenix-inferences/README.md)
-  * [Schemas and Datasets](quickstart/phoenix-inferences/inferences.md)
-
-## 💡 Concepts
+## 🔭 Tracing
 
-* [LLM Observability](concepts/llm-observability.md)
-* [Traces and Spans](concepts/llm-traces.md)
-* [Evaluation](concepts/evaluation.md)
-* [Generating Embeddings](concepts/generating-embeddings.md)
-* [Embeddings Analysis](concepts/embeddings-analysis.md)
+* [Overview: Traces](concepts/llm-traces.md)
+* [Quickstart: Traces](quickstart/llm-traces.md)
+* [Instrumentation](telemetry/instrumentation.md)
+  * [OpenInference](concepts/open-inference.md)
+* [Deployment](telemetry/deploying-phoenix.md)
+* [Custom Spans](telemetry/custom-spans.md)
 
-## 🧠 LLM Evals
+## 🧠 Evaluation
 
-* [Phoenix LLM Evals](llm-evals/llm-evals.md)
+* [Overview: Evals](llm-evals/llm-evals.md)
+* [Concept: Evaluation](concepts/evaluation.md)
+* [Quickstart: Evals](quickstart/evals.md)
 * [Running Pre-Tested Evals](llm-evals/running-pre-tested-evals/README.md)
   * [Retrieval (RAG) Relevance](llm-evals/running-pre-tested-evals/retrieval-rag-relevance.md)
   * [Hallucinations](llm-evals/running-pre-tested-evals/hallucinations.md)
@@ -37,7 +33,14 @@
 * [Building Your Own Evals](llm-evals/building-your-own-evals.md)
 * [Quickstart Retrieval Evals](llm-evals/quickstart-retrieval-evals/README.md)
   * [Retrieval Evals on Document Chunks](llm-evals/quickstart-retrieval-evals/retrieval-evals-on-document-chunks.md)
-* [Benchmarking Retrieval (RAG)](llm-evals/benchmarking-retrieval-rag.md)
+* [Benchmarking Retrieval](llm-evals/benchmarking-retrieval-rag.md)
+
+## 🌌 inferences
+
+* [Quickstart: Inferences](quickstart/phoenix-inferences/README.md)
+* [Schemas and Datasets](quickstart/phoenix-inferences/inferences.md)
+* [Generating Embeddings](concepts/generating-embeddings.md)
+* [Embeddings Analysis](concepts/embeddings-analysis.md)
 
 ## 🔮 Use Cases
 
@@ -57,13 +60,7 @@
 * [Extract Data from Spans](how-to/extract-data-from-spans.md)
 * [Use Example Datasets](how-to/use-example-datasets.md)
 
-## 🔭 telemetry
-
-* [Deploying Phoenix](telemetry/deploying-phoenix.md)
-* [Instrumentation](telemetry/instrumentation.md)
-* [Custom Spans](telemetry/custom-spans.md)
-
-## ⌨ API
+## ⌨️ API
 
 * [Dataset and Schema](api/dataset-and-schema.md)
 * [Session](api/session.md)
@@ -78,16 +75,15 @@
 * [OpenAI](integrations/openai.md)
 * [Bedrock](integrations/bedrock.md)
 * [AutoGen](integrations/autogen-support.md)
+* [DSPy](integrations/dspy.md)
 * [Arize](integrations/bring-production-data-to-notebook-for-eda-or-retraining.md)
 
-## 🏴☠ Programming Languages
+## 🏴‍☠️ Programming Languages
 
 * [JavaScript](programming-languages/javascript.md)
 
 ## 📚 Reference
 
-* [Embeddings](concepts/embeddings.md)
-* [OpenInference](concepts/open-inference.md)
 * [Frequently Asked Questions](reference/frequently-asked-questions.md)
 * [Contribute to Phoenix](reference/contribute-to-phoenix.md)
 

diff --git a/docs/api/client.md b/docs/api/client.md
@@ -63,7 +63,16 @@ A client for making HTTP requests to the Phoenix server for extracting/downloadi
 
 * **get\_trace\_dataset** -> Optional\[TraceDataset]\
   \
-  Returns the trace dataset containing spans and evaluations.
+  Returns the trace dataset containing spans and evaluations.\
+
+*   **log\_evaluations** -> None\
+    \
+    Send evaluations to Phoenix. See [#logging-multiple-evaluation-dataframes](../how-to/define-your-schema/llm-evaluations.md#logging-multiple-evaluation-dataframes "mention")for usage.\
+
+
+    **Parameters**
+
+    * **\*evaluations** (Evaluations): One or more Evaluations datasets. See [llm-evaluations.md](../how-to/define-your-schema/llm-evaluations.md "mention")for more details.
 
 ### Usage
 

diff --git a/docs/api/dataset-and-schema.md b/docs/api/dataset-and-schema.md
@@ -109,7 +109,7 @@ class EmbeddingColumnNames(
 )
 ```
 
-A dataclass that associates one or more columns of a dataframe with an [embedding](../concepts/embeddings.md) feature. Instances of this class are only used as values in a dictionary passed to the `embedding_feature_column_names` field of [Schema](dataset-and-schema.md#phoenix.schema).
+A dataclass that associates one or more columns of a dataframe with an [embedding](broken-reference) feature. Instances of this class are only used as values in a dictionary passed to the `embedding_feature_column_names` field of [Schema](dataset-and-schema.md#phoenix.schema).
 
 **\[**[**source**](https://github.com/Arize-ai/phoenix/blob/main/src/phoenix/datasets/schema.py)**]**
 

diff --git a/docs/api/evals.md b/docs/api/evals.md
@@ -37,7 +37,7 @@ Evaluates a pandas dataframe using a set of user-specified evaluators that asses
 * **provide\_explanation** (bool, optional): If true, each output dataframe will contain an explanation column containing the LLM's reasoning for each evaluation.
 * **use\_function\_calling\_if\_available** (bool, optional): If true, function calling is used (if available) as a means to constrain the LLM outputs. With function calling, the LLM is instructed to provide its response as a structured JSON object, which is easier to parse.
 * **verbose** (bool, optional): If true, prints detailed information such as model invocation parameters, retries on failed requests, etc.
-* **concurrency** (int, optional): The number of concurrent workers if async submission is possible. If&#x20;
+* **concurrency** (int, optional): The number of concurrent workers if async submission is possible. If
 
 ### Returns
 

diff --git a/docs/api/evaluation-models.md b/docs/api/evaluation-models.md
@@ -46,6 +46,8 @@ class OpenAIModel:
     """How many completions to generate for each prompt."""
     model_kwargs: Dict[str, Any] = field(default_factory=dict)
     """Holds any model parameters valid for `create` call not explicitly specified."""
+    batch_size: int = 20
+    """Batch size to use when passing multiple documents to generate."""
     request_timeout: Optional[Union[float, Tuple[float, float]]] = None
     """Timeout for requests to OpenAI completion API. Default is 600 seconds."""
     max_retries: int = 20
@@ -78,6 +80,14 @@ model = OpenAIModel(
 )
 ```
 
+{% hint style="info" %}
+Note that the `model_name` param is actually the `engine` of your deployment.  You may get a `DeploymentNotFound` error if this parameter is not correct. You can find your engine param in the Azure OpenAI playground.\
+\
+
+{% endhint %}
+
+<figure><img src="https://storage.googleapis.com/arize-assets/phoenix/assets/images/azure_openai_engine.png" alt=""><figcaption><p>How to find the model param in Azure</p></figcaption></figure>
+
 Azure OpenAI supports specific options:
 
 ```python

diff --git a/docs/concepts/embeddings-analysis.md b/docs/concepts/embeddings-analysis.md
@@ -38,7 +38,7 @@ When two datasets are used to initialize phoenix, the clusters are automatically
 
 ### UMAP Point-Cloud
 
-Phoenix projects the embeddings you provided into lower dimensional space (3 dimensions) using a dimension reduction algorithm called [UMAP](https://github.com/lmcinnes/umap) (stands for Uniform Manifold Approximation and Projection).  This lets us understand how your [embeddings have encoded semantic meaning](embeddings.md) in a visually understandable way.\
+Phoenix projects the embeddings you provided into lower dimensional space (3 dimensions) using a dimension reduction algorithm called [UMAP](https://github.com/lmcinnes/umap) (stands for Uniform Manifold Approximation and Projection).  This lets us understand how your [embeddings have encoded semantic meaning](broken-reference) in a visually understandable way.\
 \
 In addition to the point-cloud, another dimension we have at our disposal is color (and in some cases shape). Out of the box phoenix let's you assign colors to the UMAP point-cloud by dimension (features, tags, predictions, actuals), performance (correctness which distinguishes true positives and true negatives from the incorrect predictions), and dataset (to highlight areas of drift). This helps you explore your point-cloud from different perspectives depending on what you are looking for.