From ad7ae87f9406143fd68af2c40041cdfab56ee995 Mon Sep 17 00:00:00 2001
From: Xander Song <axiomofjoy@gmail.com>
Date: Tue, 4 Apr 2023 16:26:25 -1000
Subject: [PATCH] docs: update quickstart notebook (#502)

* docs: update quickstart notebook

* add -q flag to pip install
---
 tutorials/quickstart.ipynb | 207 +++++++++++++++++++++++++++++++++----
 1 file changed, 186 insertions(+), 21 deletions(-)
diff --git a/tutorials/quickstart.ipynb b/tutorials/quickstart.ipynb
index 624fbec024..754759dbcb 100644
--- a/tutorials/quickstart.ipynb
+++ b/tutorials/quickstart.ipynb
@@ -1,22 +1,94 @@
 {
  "cells": [
   {
-   "attachments": {},
    "cell_type": "markdown",
+   "metadata": {
+    "id": "_X9GuXoSXleA"
+   },
+   "source": [
+    "<center>\n",
+    "    <p style=\"text-align:center\">\n",
+    "        <img alt=\"phoenix logo\" src=\"https://storage.googleapis.com/arize-assets/phoenix/assets/phoenix-logo-light.svg\" width=\"200\"/>\n",
+    "        <br>\n",
+    "        <a href=\"https://docs.arize.com/phoenix/\">Docs</a>\n",
+    "        |\n",
+    "        <a href=\"https://github.com/Arize-ai/phoenix\">GitHub</a>\n",
+    "        |\n",
+    "        <a href=\"https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q\">Community</a>\n",
+    "    </p>\n",
+    "</center>\n",
+    "<h1 align=\"center\">Phoenix Quickstart</h1>\n",
+    "\n",
+    "In this quickstart, you will:\n",
+    "\n",
+    "- Download curated datasets of embeddings and predictions and load them into a pandas DataFrame\n",
+    "- Define a schema to describe the format of your data\n",
+    "- Launch Phoenix and explore the app\n",
+    "\n",
+    "Let's get started!\n",
+    "\n",
+    "## 1. Install Dependencies and Import Libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
    "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install -q arize-phoenix"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "QvPo5LKZjpfs"
+   },
+   "outputs": [],
+   "source": [
+    "from dataclasses import replace\n",
+    "import pandas as pd\n",
+    "import phoenix as px"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "OFeF5_Bysd2f"
+   },
    "source": [
-    "# <center>Quickstart Guide</center>\n",
-    "## <center>Gain insights into your model via Phoenix</center>\n",
+    "## 2. Download the Data\n",
     "\n",
-    "Phoenix first and foremost is an application that can run alongside your notebook environment. It takes in up to two sets of data and surfaces up drift, performance, and data quality insights.\n"
+    "Download the curated dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_df = pd.read_parquet(\n",
+    "    \"https://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_training.parquet\"\n",
+    ")\n",
+    "prod_df = pd.read_parquet(\n",
+    "    \"https://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_production.parquet\"\n",
+    ")"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 📚 Install `arize-phoenix` "
+    "## 3. Launch Phoenix\n",
+    "\n",
+    "### a) Define Your Schema\n",
+    "To launch Phoenix with your data, you first need to define a schema that tells Phoenix which columns of your DataFrames correspond to features, predictions, actuals (i.e., ground truth), embeddings, etc.\n",
+    "\n",
+    "The trickiest part is defining embedding features. In this case, each embedding feature has two pieces of information: the embedding vector itself contained in the \"image_vector\" column and the link to the image contained in the \"url\" column.\n",
+    "\n",
+    "Define a schema for your training data."
    ]
   },
   {
@@ -25,7 +97,24 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import phoenix as px"
+    "train_schema = px.Schema(\n",
+    "    timestamp_column_name=\"prediction_ts\",\n",
+    "    prediction_label_column_name=\"predicted_action\",\n",
+    "    actual_label_column_name=\"actual_action\",\n",
+    "    embedding_feature_column_names={\n",
+    "        \"image_embedding\": px.EmbeddingColumnNames(\n",
+    "            vector_column_name=\"image_vector\",\n",
+    "            link_to_data_column_name=\"url\",\n",
+    "        ),\n",
+    "    },\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The schema for your production data is the same, except it does not have an actual label column."
    ]
   },
   {
@@ -34,17 +123,15 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "?px"
+    "prod_schema = replace(train_schema, actual_label_column_name=None)"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Using a built-in dataset to view the application\n",
-    "\n",
-    "To get familiar with the application itself, the easiest way to get started is to use one of phoenix's example datasets."
+    "### b) Define Your Datasets\n",
+    "Next, define your primary and reference datasets. In this case, your reference dataset contains training data and your primary dataset contains production data."
    ]
   },
   {
@@ -53,18 +140,97 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import phoenix as px\n",
+    "prod_ds = px.Dataset(prod_df, prod_schema)\n",
+    "train_ds = px.Dataset(train_df, train_schema)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### c) Create a Phoenix Session"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "session = px.launch_app(prod_ds, train_ds)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### d) Launch the Phoenix UI\n",
     "\n",
-    "# Get the fixture datasets via a specific use case. Some valid values are \"fashion_mnist\", \"sentiment_classification_language_drift\", and \"credit_card_fraud\"\n",
-    "datasets = px.load_example(\"sentiment_classification_language_drift\")\n",
-    "session = px.launch_app(datasets.primary, datasets.reference)\n",
+    "You can open Phoenix by copying and pasting the output of `session.url` into a new browser tab."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "session.url"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alternatively, you can open the Phoenix UI in your notebook with"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
     "session.view()"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Explore the App\n",
+    "\n",
+    "Click on \"image_embedding\" in the \"Embeddings\" section to visualize your embedding data. What insights can you uncover from this page?\n",
+    "\n",
+    "## 5. Close the App\n",
+    "\n",
+    "When you're done, don't forget to close the app."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "px.close_app()"
+   ]
   }
  ],
  "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "collapsed_sections": [
+    "QOudyT6lPBqp"
+   ],
+   "machine_shape": "hm",
+   "provenance": [],
+   "toc_visible": true
+  },
   "kernelspec": {
-   "display_name": "phoenix",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -78,10 +244,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.3"
-  },
-  "orig_nbformat": 4
+   "version": "3.8.15"
+  }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 1
 }