diff --git a/tutorials/image_classification_tutorial.ipynb b/tutorials/image_classification_tutorial.ipynb
new file mode 100644
index 0000000000..ee6fc80edd
--- /dev/null
+++ b/tutorials/image_classification_tutorial.ipynb
@@ -0,0 +1,426 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "_X9GuXoSXleA"
+ },
+ "source": [
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ " Docs\n",
+ " |\n",
+ " GitHub\n",
+ " |\n",
+ " Community\n",
+ "
\n",
+ "\n",
+ "Active Learning for a Drifting Image Classification Model
\n",
+ "\n",
+ "Imagine you're in charge of maintaining a model that classifies the action of people in photographs. Your model initially performs well in production, but its performance gradually degrades over time.\n",
+ "\n",
+ "Phoenix helps you surface the reason for this regression by analyzing the embeddings representing each image. Your model was trained on crisp and high-resolution images, but as you'll discover, it's encountering blurred and noisy images in production that it can't correctly classify.\n",
+ "\n",
+ "In this tutorial, you will:\n",
+ "\n",
+ "- Download curated datasets of embeddings and predictions\n",
+ "- Define a schema to describe the format of your data\n",
+ "- Launch Phoenix to visually explore your embeddings\n",
+ "- Investigate problematic clusters\n",
+ "- Export problematic production data for labeling and fine-tuning\n",
+ "\n",
+ "Let's get started!\n",
+ "\n",
+ "## 1. Install Dependencies and Import Libraries"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%pip install arize-phoenix"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "QvPo5LKZjpfs"
+ },
+ "outputs": [],
+ "source": [
+ "import uuid\n",
+ "from dataclasses import replace\n",
+ "from datetime import datetime\n",
+ "\n",
+ "from IPython.display import display, HTML\n",
+ "import pandas as pd\n",
+ "import phoenix as px"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "OFeF5_Bysd2f"
+ },
+ "source": [
+ "## 2. Download and Inspect the Data\n",
+ "\n",
+ "Download the curated dataset."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_df = pd.read_parquet(\"https://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_training.parquet\")\n",
+ "prod_df = pd.read_parquet(\"https://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_production.parquet\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "View the first few rows of the training DataFrame."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The columns of the DataFrame are:\n",
+ "- **prediction_id:** a unique identifier for each data point\n",
+ "- **prediction_ts:** the Unix timestamps of your predictions\n",
+ "- **url:** a link to the image data\n",
+ "- **image_vector:** the embedding vectors representing each review\n",
+ "- **actual_action:** the ground truth for each image (sleeping, eating, running, etc.)\n",
+ "- **predicted_action:** the predicted class for the image\n",
+ "\n",
+ "View the first few rows of the production DataFrame."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "prod_df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Notice that the production data is missing ground truth, i.e., has no \"actual_action\" column.\n",
+ "\n",
+ "Display a few images alongside their predicted and actual labels. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def display_examples(df):\n",
+ " \"\"\"\n",
+ " Displays each image alongside the actual and predicted classes.\n",
+ " \"\"\"\n",
+ " sample_df = df[[\"actual_action\", \"predicted_action\", \"url\"]].rename(columns={\"url\": \"image\"})\n",
+ " html = sample_df.to_html(escape=False, index=False, formatters={\"image\": lambda url: f''})\n",
+ " display(HTML(html))\n",
+ " \n",
+ "display_examples(train_df.head())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "0BIeGAemfziv"
+ },
+ "source": [
+ "## 3. Prepare the Data\n",
+ "\n",
+ "The original data is from April 2022. Update the timestamps to the current time."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "xzYoV-hemYsE"
+ },
+ "outputs": [],
+ "source": [
+ "latest_timestamp = max(prod_df['prediction_ts'])\n",
+ "current_timestamp = datetime.timestamp(datetime.now())\n",
+ "delta = current_timestamp - latest_timestamp\n",
+ "\n",
+ "train_df['prediction_ts'] = (train_df['prediction_ts'] + delta).astype(float)\n",
+ "prod_df['prediction_ts'] = (prod_df['prediction_ts'] + delta).astype(float)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 4. Launch Phoenix\n",
+ "\n",
+ "### a) Define Your Schema\n",
+ "To launch Phoenix with your data, you first need to define a schema that tells Phoenix which columns of your DataFrames correspond to features, predictions, actuals (i.e., ground truth), embeddings, etc.\n",
+ "\n",
+ "The trickiest part is defining embedding features. In this case, each embedding feature has two pieces of information: the embedding vector itself contained in the \"image_vector\" column and the link to the image contained in the \"url\" column.\n",
+ "\n",
+ "Define a schema for your training data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_schema = px.Schema(\n",
+ " timestamp_column_name=\"prediction_ts\",\n",
+ " prediction_label_column_name=\"predicted_action\",\n",
+ " actual_label_column_name=\"actual_action\",\n",
+ " embedding_feature_column_names={\n",
+ " \"image_embedding\": px.EmbeddingColumnNames(\n",
+ " vector_column_name=\"image_vector\",\n",
+ " link_to_data_column_name=\"url\",\n",
+ " ),\n",
+ " },\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The schema for your production data is the same, except it does not have an actual label column."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "prod_schema = replace(train_schema, actual_label_column_name=None)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### b) Define Your Datasets\n",
+ "Next, define your primary and reference datasets. In this case, your reference dataset contains training data and your primary dataset contains production data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "prod_ds = px.Dataset(prod_df, prod_schema)\n",
+ "train_ds = px.Dataset(train_df, train_schema)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### c) Create a Phoenix Session"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "session = px.launch_app(prod_ds, train_ds)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### d) Launch the Phoenix UI\n",
+ "\n",
+ "You can open Phoenix by copying and pasting the output of `session.url` into a new browser tab."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "session.url"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Alternatively, you can open the Phoenix UI in your notebook with"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "session.view()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 5. Find and Export Problematic Clusters\n",
+ "\n",
+ "### Steps\n",
+ "\n",
+ "1. Click on \"image_embedding\" in the \"Embeddings\" section.\n",
+ "1. In the Euclidean distance graph at the top of the page, select a point on the graph where the Euclidean distance is high.\n",
+ "1. Click on the top cluster in the panel on the left.\n",
+ "1. Use the panel at the bottom to examine the data points in this cluster.\n",
+ "1. Click on the \"Export\" button to save your cluster.\n",
+ "\n",
+ "### Questions:\n",
+ "\n",
+ "1. What does the Euclidean distance graph measure?\n",
+ "1. What do the points in the point cloud represent?\n",
+ "1. What do you notice about the cluster you selected?\n",
+ "1. What's gone wrong with your model in production?\n",
+ "\n",
+ "### Answers\n",
+ "\n",
+ "1. This graph measures the drift of your production data relative to your training data over time.\n",
+ "1. Each point in the point cloud corresponds to an image. Phoenix has taken the high-dimensional embeddings in your original DataFrame and has reduced the dimensionality so that you can view them in lower dimensions.\n",
+ "1. It consists almost entirely of production data, meaning that your model is seeing data in production the likes of which it never saw during training.\n",
+ "1. Your model was trained crisp and high-resolution images. In production, your model is encountering blurry and noisy images that it cannot correctly classify.\n",
+ "\n",
+ "## 6. Load and View Exported Data\n",
+ "\n",
+ "View your exported files."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "session.exports"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Load your most recent exported data back into a DataFrame."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "export_df = session.exports[0].dataframe\n",
+ "export_df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Display a few examples from your export."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "display_examples(export_df.head())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Congrats! You've pinpointed the blurry or noisy images that are hurting your model's performance in production. As an actionable next step, you can label your exported production data and fine-tune your model to improve performance.\n",
+ "\n",
+ "## 7. Close the App\n",
+ "\n",
+ "When you're done, don't forget to close the app."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "px.close_app()"
+ ]
+ }
+ ],
+ "metadata": {
+ "accelerator": "GPU",
+ "colab": {
+ "collapsed_sections": [
+ "QOudyT6lPBqp"
+ ],
+ "machine_shape": "hm",
+ "provenance": [],
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.15"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}