Merge pull request #1779 from Arize-ai/docs

docs: sync to main
Arize-ai · Nov 17, 2023 · f6e27a1 · f6e27a1
2 parents 13b9ab0 + de3817d
commit f6e27a1
Show file tree

Hide file tree

Showing 35 changed files with 495 additions and 79 deletions.
diff --git a/docs/.gitbook/assets/GPT-3.5 Ref Link (1).png b/docs/.gitbook/assets/GPT-3.5 Ref Link (1).png
diff --git a/docs/.gitbook/assets/GPT-3.5 Ref Link.png b/docs/.gitbook/assets/GPT-3.5 Ref Link.png
diff --git a/docs/.gitbook/assets/GPT-4 Ref Evals (1).png b/docs/.gitbook/assets/GPT-4 Ref Evals (1).png
diff --git a/docs/.gitbook/assets/GPT-4 Ref Evals (2).png b/docs/.gitbook/assets/GPT-4 Ref Evals (2).png
diff --git a/docs/.gitbook/assets/GPT-4 Ref Evals (3).png b/docs/.gitbook/assets/GPT-4 Ref Evals (3).png
diff --git a/docs/.gitbook/assets/GPT-4 Ref Evals.png b/docs/.gitbook/assets/GPT-4 Ref Evals.png
diff --git a/docs/.gitbook/assets/GPT-4 Turbo Ref link.png b/docs/.gitbook/assets/GPT-4 Turbo Ref link.png
diff --git a/docs/.gitbook/assets/GPT-4 Turbo Summarization.png b/docs/.gitbook/assets/GPT-4 Turbo Summarization.png
diff --git a/docs/.gitbook/assets/GPT-4 Turbo.png b/docs/.gitbook/assets/GPT-4 Turbo.png
diff --git a/docs/.gitbook/assets/GPT-4-Turbo_halluc.png b/docs/.gitbook/assets/GPT-4-Turbo_halluc.png
diff --git a/docs/.gitbook/assets/Screenshot 2023-11-06 at 12.27.43 PM.png b/docs/.gitbook/assets/Screenshot 2023-11-06 at 12.27.43 PM.png
diff --git a/docs/.gitbook/assets/Screenshot 2023-11-06 at 12.28.17 PM.png b/docs/.gitbook/assets/Screenshot 2023-11-06 at 12.28.17 PM.png
diff --git a/docs/.gitbook/assets/Screenshot 2023-11-06 at 9.43.45 AM.png b/docs/.gitbook/assets/Screenshot 2023-11-06 at 9.43.45 AM.png
diff --git a/docs/.gitbook/assets/Screenshot 2023-11-11 at 11.10.29 AM.png b/docs/.gitbook/assets/Screenshot 2023-11-11 at 11.10.29 AM.png
diff --git a/docs/.gitbook/assets/TEMP - inferences df preview - prod b/docs/.gitbook/assets/TEMP - inferences df preview - prod
diff --git a/docs/.gitbook/assets/TEMP - inferences df preview - train b/docs/.gitbook/assets/TEMP - inferences df preview - train
diff --git a/docs/.gitbook/assets/chunks_concat (1).png b/docs/.gitbook/assets/chunks_concat (1).png
diff --git a/docs/.gitbook/assets/chunks_concat.png b/docs/.gitbook/assets/chunks_concat.png
diff --git a/docs/.gitbook/assets/gpt-4-turbo-code.png b/docs/.gitbook/assets/gpt-4-turbo-code.png
diff --git a/docs/.gitbook/assets/gpt-4-turbo-toxicity.png b/docs/.gitbook/assets/gpt-4-turbo-toxicity.png
diff --git a/docs/README.md b/docs/README.md
@@ -11,18 +11,18 @@ coverY: 0
 
 Phoenix is an open-source observability library designed for experimentation, evaluation, and troubleshooting.
 
-The toolset is designed to ingest [inference data](quickstart/inferences.md) for [LLMs](concepts/llm-observability.md), CV, NLP, and tabular datasets as well as [LLM traces](quickstart/llm-traces/). It allows AI Engineers and Data Scientists to quickly visualize their data, evaluate performance, track down issues & insights, and easily export to improve.&#x20;
+The toolset is designed to ingest [inference data](quickstart/phoenix-inferences/inferences.md) for [LLMs](concepts/llm-observability.md), CV, NLP, and tabular datasets as well as [LLM traces](quickstart/llm-traces/). It allows AI Engineers and Data Scientists to quickly visualize their data, evaluate performance, track down issues & insights, and easily export to improve.&#x20;
 
 ## Quickstarts
 
 Running Phoenix for the first time? Select a quickstart below.&#x20;
 
-<table data-card-size="large" data-view="cards"><thead><tr><th align="center"></th><th data-hidden data-card-target data-type="content-ref"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td align="center"><strong>LLM Traces</strong></td><td><a href="quickstart/llm-traces/">llm-traces</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.51.45 PM.png">Screenshot 2023-09-27 at 1.51.45 PM.png</a></td></tr><tr><td align="center"><strong>Inferences</strong></td><td><a href="quickstart/inferences.md">inferences.md</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.53.06 PM.png">Screenshot 2023-09-27 at 1.53.06 PM.png</a></td></tr></tbody></table>
+<table data-card-size="large" data-view="cards"><thead><tr><th align="center"></th><th data-hidden data-card-target data-type="content-ref"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td align="center"><strong>LLM Traces</strong></td><td><a href="quickstart/llm-traces/">llm-traces</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.51.45 PM.png">Screenshot 2023-09-27 at 1.51.45 PM.png</a></td></tr><tr><td align="center"><strong>Inferences</strong></td><td><a href="quickstart/phoenix-inferences/inferences.md">inferences.md</a></td><td><a href=".gitbook/assets/Screenshot 2023-09-27 at 1.53.06 PM.png">Screenshot 2023-09-27 at 1.53.06 PM.png</a></td></tr></tbody></table>
 
 Don't know which one to choose? Phoenix has two main data ingestion methods:
 
 1. [LLM Traces:](quickstart/llm-traces/) Phoenix is used on top of trace data generated by LlamaIndex and LangChain. The general use case is to troubleshoot LLM applications with agentic workflows.&#x20;
-2. [Inferences](quickstart/inferences.md): Phoenix is used to troubleshoot models whose datasets can be expressed as DataFrames in Python such as LLM applications built in Python workflows, CV, NLP, and tabular models.
+2. [Inferences](quickstart/phoenix-inferences/inferences.md): Phoenix is used to troubleshoot models whose datasets can be expressed as DataFrames in Python such as LLM applications built in Python workflows, CV, NLP, and tabular models.
 
 ### **Phoenix Functionality**&#x20;
 

diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
@@ -5,9 +5,10 @@
 
 ## 🔑 Quickstart
 
-* [LLM Traces - OpenAI, LangChain & LlamaIndex](quickstart/llm-traces/README.md)
+* [Phoenix Traces for LLM applications - OpenAI, LangChain & LlamaIndex](quickstart/llm-traces/README.md)
   * [AutoGen Support](quickstart/llm-traces/autogen-support.md)
-* [Inferences - Image/NLP/LLM](quickstart/inferences.md)
+* [Phoenix Inferences](quickstart/phoenix-inferences/README.md)
+  * [Schemas and Datasets](quickstart/phoenix-inferences/inferences.md)
 
 ## 💡 Concepts
 
@@ -26,6 +27,7 @@
   * [Toxicity](llm-evals/running-pre-tested-evals/toxicity.md)
   * [Code Generation Eval](llm-evals/running-pre-tested-evals/code-generation-eval.md)
   * [Summarization Eval](llm-evals/running-pre-tested-evals/summarization-eval.md)
+  * [Reference Link Evals](llm-evals/running-pre-tested-evals/reference-link-evals.md)
 * [Building Your Own Evals](llm-evals/building-your-own-evals.md)
 * [Benchmarking Retrieval (RAG)](llm-evals/benchmarking-retrieval-rag.md)
 

diff --git a/docs/api/evals.md b/docs/api/evals.md
@@ -1,7 +1,7 @@
 ---
 description: >-
-    Evals are LLM-powered functions that you can use to evaluate the output of
-    your LLM or generative application
+  Evals are LLM-powered functions that you can use to evaluate the output of
+  your LLM or generative application
 ---
 
 # Evals
@@ -23,13 +23,13 @@ Class used to store and format prompt templates.
 
 ### Parameters
 
--   **text** (str): The raw prompt text used as a template.
--   **delimiters** (List\[str]): List of characters used to locate the variables within the prompt template `text`. Defaults to `["{", "}"]`.
+* **text** (str): The raw prompt text used as a template.
+* **delimiters** (List\[str]): List of characters used to locate the variables within the prompt template `text`. Defaults to `["{", "}"]`.
 
 ### Attributes
 
--   **text** (str): The raw prompt text used as a template.
--   **variables** (List\[str]): The names of the variables that, once their values are substituted into the template, create the prompt text. These variable names are automatically detected from the template `text` using the `delimiters` passed when initializing the class (see Usage section below).
+* **text** (str): The raw prompt text used as a template.
+* **variables** (List\[str]): The names of the variables that, once their values are substituted into the template, create the prompt text. These variable names are automatically detected from the template `text` using the `delimiters` passed when initializing the class (see Usage section below).
 
 ### Usage
 
@@ -72,7 +72,7 @@ print(prompt_template.format(value_dict))
 
 Note that once you initialize the `PromptTemplate` class, you don't need to worry about delimiters anymore, it will be handled for you.
 
-## phoenix.experimental.evals.llm_classify
+## phoenix.experimental.evals.llm\_classify
 
 ```python
 def llm_classify(
@@ -91,20 +91,20 @@ Classifies each input row of the `dataframe` using an LLM. Returns a `pandas.Dat
 
 ### Parameters
 
--   **dataframe (pandas.DataFrame)**: A pandas dataframe in which each row represents a record to be classified. All template variable names must appear as column names in the dataframe (extra columns unrelated to the template are permitted).
--   **template (ClassificationTemplate, or str):** The prompt template as either an instance of PromptTemplate or a string. If the latter, the variable names should be surrounded by curly braces so that a call to `.format` can be made to substitute variable values.
--   **model (BaseEvalModel):** An LLM model class instance
--   **rails (List\[str]):** A list of strings representing the possible output classes of the model's predictions.
--   **system_instruction (Optional\[str]):** An optional system message for modals that support it
--   **verbose (bool, optional):** If `True`, prints detailed info to stdout such as model invocation parameters and details about retries and snapping to rails. Default `False`.
--   **use_function_calling_if_available (bool, default=True):** If `True`, use function calling (if available) as a means to constrain the LLM outputs. With function calling, the LLM is instructed to provide its response as a structured JSON object, which is easier to parse.
--   **provide_explanation (bool, default=False):** If `True`, provides an explanation for each classification label. A column named `explanation` is added to the output dataframe. Note that this will default to using function calling if available. If the model supplied does not support function calling, `llm_classify` will need a prompt template that prompts for an explanation. For phoenix's pre-tested eval templates, the template is swapped out for a [chain-of-thought](https://www.promptingguide.ai/techniques/cot) based template that prompts for an explanation.
+* **dataframe (pandas.DataFrame)**: A pandas dataframe in which each row represents a record to be classified. All template variable names must appear as column names in the dataframe (extra columns unrelated to the template are permitted).
+* **template (ClassificationTemplate, or str):** The prompt template as either an instance of PromptTemplate or a string. If the latter, the variable names should be surrounded by curly braces so that a call to `.format` can be made to substitute variable values.
+* **model (BaseEvalModel):** An LLM model class instance
+* **rails (List\[str]):** A list of strings representing the possible output classes of the model's predictions.
+* **system\_instruction (Optional\[str]):** An optional system message for modals that support it
+* **verbose (bool, optional):** If `True`, prints detailed info to stdout such as model invocation parameters and details about retries and snapping to rails. Default `False`.
+* **use\_function\_calling\_if\_available (bool, default=True):** If `True`, use function calling (if available) as a means to constrain the LLM outputs. With function calling, the LLM is instructed to provide its response as a structured JSON object, which is easier to parse.
+* **provide\_explanation (bool, default=False):** If `True`, provides an explanation for each classification label. A column named `explanation` is added to the output dataframe. Note that this will default to using function calling if available. If the model supplied does not support function calling, `llm_classify` will need a prompt template that prompts for an explanation. For phoenix's pre-tested eval templates, the template is swapped out for a [chain-of-thought](https://www.promptingguide.ai/techniques/cot) based template that prompts for an explanation.
 
 ### Returns
 
--   **pandas.DataFrame:** A dataframe where the `label` column (at column position 0) contains the classification labels. If `provide_explanation=True`, then an additional column named `explanation` is added to contain the explanation for each label. The dataframe has the same length and index as the input dataframe. The classification label values are from the entries in the rails argument or "NOT_PARSABLE" if the model's output could not be parsed.
+* **pandas.DataFrame:** A dataframe where the `label` column (at column position 0) contains the classification labels. If `provide_explanation=True`, then an additional column named `explanation` is added to contain the explanation for each label. The dataframe has the same length and index as the input dataframe. The classification label values are from the entries in the rails argument or "NOT\_PARSABLE" if the model's output could not be parsed.
 
-## phoenix.experimental.run_relevance_eval
+## phoenix.experimental.run\_relevance\_eval
 
 ```python
 def run_relevance_eval(
@@ -122,25 +122,25 @@ Given a pandas dataframe containing queries and retrieved documents, classifies
 
 ### Parameters
 
--   **dataframe (pd.DataFrame):** A pandas dataframe containing queries and retrieved documents. If both query_column_name and reference_column_name are present in the input dataframe, those columns are used as inputs and should appear in the following format:
-    -   The entries of the query column must be strings.
-    -   The entries of the documents column must be lists of strings. Each list may contain an arbitrary number of document texts retrieved for the corresponding query.
-    -   If the input dataframe is lacking either query_column_name or reference_column_name but has query and retrieved document columns in OpenInference trace format named "attributes.input.value" and "attributes.retrieval.documents", respectively, then those columns are used as inputs and should appear in the following format:
-        -   The entries of the query column must be strings.
-        -   The entries of the document column must be lists of OpenInference document objects, each object being a dictionary that stores the document text under the key "document.content".
--   **model (BaseEvalModel):** The model used for evaluation.
--   # **template (Union\[ClassificationPromptTemplate, PromptTemplate, str], optional):** The template used for evaluation.
--   **template (Union\[PromptTemplate, str], optional):** The template used for evaluation.
--   **rails (List\[str], optional):** A list of strings representing the possible output classes of the model's predictions.
--   **query_column_name (str, optional):** The name of the query column in the dataframe, which should also be a template variable.
--   **reference_column_name (str, optional):** The name of the document column in the dataframe, which should also be a template variable.
--   **system_instruction (Optional\[str], optional):** An optional system message.
+* **dataframe (pd.DataFrame):** A pandas dataframe containing queries and retrieved documents. If both query\_column\_name and reference\_column\_name are present in the input dataframe, those columns are used as inputs and should appear in the following format:
+  * The entries of the query column must be strings.
+  * The entries of the documents column must be lists of strings. Each list may contain an arbitrary number of document texts retrieved for the corresponding query.
+  * If the input dataframe is lacking either query\_column\_name or reference\_column\_name but has query and retrieved document columns in OpenInference trace format named "attributes.input.value" and "attributes.retrieval.documents", respectively, then those columns are used as inputs and should appear in the following format:
+    * The entries of the query column must be strings.
+    * The entries of the document column must be lists of OpenInference document objects, each object being a dictionary that stores the document text under the key "document.content".
+* **model (BaseEvalModel):** The model used for evaluation.
+* ## **template (Union\[ClassificationPromptTemplate, PromptTemplate, str], optional):** The template used for evaluation.
+* **template (Union\[PromptTemplate, str], optional):** The template used for evaluation.
+* **rails (List\[str], optional):** A list of strings representing the possible output classes of the model's predictions.
+* **query\_column\_name (str, optional):** The name of the query column in the dataframe, which should also be a template variable.
+* **reference\_column\_name (str, optional):** The name of the document column in the dataframe, which should also be a template variable.
+* **system\_instruction (Optional\[str], optional):** An optional system message.
 
 ### Returns
 
--   **evaluations (List\[List\[str]]):** A list of relevant and not relevant classifications. The "shape" of the list should mirror the "shape" of the retrieved documents column, in the sense that it has the same length as the input dataframe and each sub-list has the same length as the corresponding list in the retrieved documents column. The values in the sub-lists are either entries from the rails argument or "NOT_PARSABLE" in the case where the LLM output could not be parsed.
+* **evaluations (List\[List\[str]]):** A list of relevant and not relevant classifications. The "shape" of the list should mirror the "shape" of the retrieved documents column, in the sense that it has the same length as the input dataframe and each sub-list has the same length as the corresponding list in the retrieved documents column. The values in the sub-lists are either entries from the rails argument or "NOT\_PARSABLE" in the case where the LLM output could not be parsed.
 
-## phoenix.experimental.evals.llm_generate
+## phoenix.experimental.evals.llm\_generate
 
 ```python
 def llm_generate(
@@ -156,15 +156,15 @@ Generates a text using a template using an LLM. This function is useful if you w
 
 ### Parameters
 
--   **dataframe (pandas.DataFrame)**: A pandas dataframe in which each row represents a record to be used as in input to the template. All template variable names must appear as column names in the dataframe (extra columns unrelated to the template are permitted).
--   **template (Union\[PromptTemplate, str])**: The prompt template as either an instance of PromptTemplate or a string. If the latter, the variable names should be surrounded by curly braces so that a call to `format` can be made to substitute variable values.
--   **model (BaseEvalModel)**: An LLM model class.
--   **system_instruction (Optional\[str], optional):** An optional system message.
--   **output_parser (Callable[[str], Dict[str, Any]], optional)** An optional function that takes each generated response and parses it to a dictionary. The keys of the dictionary should correspond to the column names of the output dataframe. If None, the output dataframe will have a single column named "output".
+* **dataframe (pandas.DataFrame)**: A pandas dataframe in which each row represents a record to be used as in input to the template. All template variable names must appear as column names in the dataframe (extra columns unrelated to the template are permitted).
+* **template (Union\[PromptTemplate, str])**: The prompt template as either an instance of PromptTemplate or a string. If the latter, the variable names should be surrounded by curly braces so that a call to `format` can be made to substitute variable values.
+* **model (BaseEvalModel)**: An LLM model class.
+* **system\_instruction (Optional\[str], optional):** An optional system message.
+* **output\_parser (Callable\[\[str], Dict\[str, Any]], optional)** An optional function that takes each generated response and parses it to a dictionary. The keys of the dictionary should correspond to the column names of the output dataframe. If None, the output dataframe will have a single column named "output".
 
 ### Returns
 
--   **generations_dataframe (pandas.DataFrame)**: A dataframe where each row represents the generated output
+* **generations\_dataframe (pandas.DataFrame)**: A dataframe where each row represents the generated output
 
 ### Usage
 

diff --git a/docs/how-to/define-your-schema/README.md b/docs/how-to/define-your-schema/README.md
@@ -7,7 +7,7 @@ description: How to create Phoenix datasets and schemas for common data formats
 This guide shows you how to define a Phoenix dataset using your own data.
 
 {% hint style="info" %}
-* For a conceptual overview of the Phoenix API, including a high-level introduction to the notion of datasets and schemas, see [Phoenix Basics](../../quickstart/inferences.md#schemas).
+* For a conceptual overview of the Phoenix API, including a high-level introduction to the notion of datasets and schemas, see [Phoenix Basics](../../quickstart/phoenix-inferences/inferences.md#schemas).
 * For a comprehensive description of `phoenix.Dataset` and `phoenix.Schema`, see the [API reference](../../api/dataset-and-schema.md).
 {% endhint %}
 

diff --git a/docs/how-to/manage-the-app.md b/docs/how-to/manage-the-app.md
@@ -9,7 +9,7 @@ description: >-
 ## Define Your Dataset(s)
 
 {% hint style="info" %}
-For a conceptual overview of datasets, including an explanation of when to use a single dataset vs. primary and reference datasets, see [Phoenix Basics](../quickstart/inferences.md#datasets).
+For a conceptual overview of datasets, including an explanation of when to use a single dataset vs. primary and reference datasets, see [Phoenix Basics](../quickstart/phoenix-inferences/inferences.md#datasets).
 {% endhint %}
 
 To define a dataset, you must load your data into a pandas dataframe and [create a matching schema](define-your-schema/). If you have a dataframe `prim_df` and a matching `prim_schema`, you can define a dataset named "primary" with