From c68297f2870235ee7d247eab35ed5264579fc194 Mon Sep 17 00:00:00 2001
From: root <ronayak@hotmail.com>
Date: Mon, 1 Aug 2022 19:41:48 +0000
Subject: [PATCH 1/2] add filtering step

---
 ...mender-Systems-with-Merlin_filtering.ipynb | 2910 +++++++++++++++++
 ...RecSys-with-Merlin-Systems_filtering.ipynb |  754 +++++
 2 files changed, 3664 insertions(+)
 create mode 100644 examples/Building-and-deploying-multi-stage-RecSys/01-Building-Recommender-Systems-with-Merlin_filtering.ipynb
 create mode 100644 examples/Building-and-deploying-multi-stage-RecSys/02-Deploying-multi-stage-RecSys-with-Merlin-Systems_filtering.ipynb
diff --git a/examples/Building-and-deploying-multi-stage-RecSys/01-Building-Recommender-Systems-with-Merlin_filtering.ipynb b/examples/Building-and-deploying-multi-stage-RecSys/01-Building-Recommender-Systems-with-Merlin_filtering.ipynb
new file mode 100644
index 000000000..3a65bc80a
--- /dev/null
+++ b/examples/Building-and-deploying-multi-stage-RecSys/01-Building-Recommender-Systems-with-Merlin_filtering.ipynb
@@ -0,0 +1,2910 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "8c3403a6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Copyright 2021 NVIDIA Corporation. All Rights Reserved.\n",
+    "#\n",
+    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+    "# you may not use this file except in compliance with the License.\n",
+    "# You may obtain a copy of the License at\n",
+    "#\n",
+    "#     http://www.apache.org/licenses/LICENSE-2.0\n",
+    "#\n",
+    "# Unless required by applicable law or agreed to in writing, software\n",
+    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+    "# See the License for the specific language governing permissions and\n",
+    "# limitations under the License.\n",
+    "# ================================"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad9b5cc0-2110-464e-9773-003ffe7d216c",
+   "metadata": {},
+   "source": [
+    "<img src=\"http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png\" style=\"width: 90px; float: right;\">\n",
+    "\n",
+    "## Building Intelligent Recommender Systems with Merlin\n",
+    "\n",
+    "This notebook is created using the latest stable [merlin-tensorflow-inference](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow-inference/tags) container. \n",
+    "\n",
+    "### Overview"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9657308-2e08-49b4-8924-eace75a4634c",
+   "metadata": {},
+   "source": [
+    "Recommender Systems (RecSys) are the engine of the modern internet and the catalyst for human decisions. Building a recommendation system is challenging because it requires multiple stages (data preprocessing, offline training, item retrieval, filtering, ranking, ordering, etc.) to work together seamlessly and efficiently. The biggest challenges for new practitioners are the lack of understanding around what RecSys look like in the real world, and the gap between examples of simple models and a production-ready end-to-end recommender systems."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "405280b0-3d48-43b6-ab95-d29be7a43e9e",
+   "metadata": {},
+   "source": [
+    "The figure below represents a four-stage recommender systems. This is more complex process than only training a single model and deploying it, and it is much more realistic and closer to what's happening in the real-world recommender production systems."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27220153",
+   "metadata": {},
+   "source": [
+    "![fourstage](../images/fourstages.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b27ffed1-4b4b-4b6f-b933-31e9f6c1b4e1",
+   "metadata": {},
+   "source": [
+    "In these series of notebooks, we are going to showcase how we can deploy a four-stage recommender systems using Merlin Systems library easily on [Triton Inference Server](https://github.com/triton-inference-server/server). Let's go over the concepts in the figure briefly. \n",
+    "- **Retrieval:** This is the step to narrow down millions of items into thousands of candidates. We are going to train a Two-Tower item retrieval model to retrieve the relevant top-K candidate items.\n",
+    "- **Filtering:** This step is to exclude the already interacted  or undesirable items from the candidate items set or to apply business logic rules. Although this is an important step, for this example we skip this step.\n",
+    "- **Scoring:** This is also known as ranking. Here the retrieved and filtered candidate items are being scored. We are going to train a ranking model to be able to use at our scoring step. \n",
+    "- **Ordering:** At this stage, we can order the final set of items that we want to recommend to the user. Here, we’re able to align the output of the model with business needs, constraints, or criteria.\n",
+    "\n",
+    "To learn more about the four-stage recommender systems, you can listen to Even Oldridge's [Moving Beyond Recommender Models talk](https://www.youtube.com/watch?v=5qjiY-kLwFY&list=PL65MqKWg6XcrdN4TJV0K1PdLhF_Uq-b43&index=7) at KDD'21 and read more [in this blog post](https://eugeneyan.com/writing/system-design-for-discovery/)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e26f3194-9f17-4fa7-8baa-14333f2a122a",
+   "metadata": {},
+   "source": [
+    "### Learning objectives\n",
+    "- Understanding four stages of recommender systems\n",
+    "- Training retrieval and ranking models with Merlin Models\n",
+    "- Setting up feature store and approximate nearest neighbours (ANN) search libraries\n",
+    "- Deploying trained models to Triton Inference Server with Merlin Systems"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "58d8bd1f-fa29-4d4b-a320-c76538f2302f",
+   "metadata": {},
+   "source": [
+    "In addition to NVIDIA Merlin libraries and the Triton Inference Server client library, we use two external libraries in these series of examples:\n",
+    "\n",
+    "- [Feast](https://docs.feast.dev/): an end-to-end open source feature store library for machine learning\n",
+    "- [Faiss](https://github.com/facebookresearch/faiss): a library for efficient similarity search and clustering of dense vectors\n",
+    "\n",
+    "You can find more information about `Feast feature store` and `Faiss` libraries in the next notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46b7f3bd",
+   "metadata": {},
+   "source": [
+    "### Import required libraries and functions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c1586d8-e5a6-40c3-b6bb-61a3e62fa34c",
+   "metadata": {},
+   "source": [
+    "**Compatibility:**\n",
+    "\n",
+    "These notebooks are developed and tested using our latest `merlin-tensorflow:22.XX` container on [NVIDIA's docker registry](https://catalog.ngc.nvidia.com/containers?filters=&orderBy=dateModifiedDESC&query=merlin)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "2cd8cc8d-5cc7-4a9f-91e5-3deec6f1fe74",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# for running this example on GPU, install the following libraries\n",
+    "# %pip install tensorflow \"feast<0.20\" faiss-gpu\n",
+    "\n",
+    "# for running this example on CPU, uncomment the following lines\n",
+    "# %pip install tensorflow-cpu \"feast<0.20\" faiss-cpu\n",
+    "# %pip uninstall cudf\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "08cdbfcc",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2022-08-01 18:43:22.835975: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX\n",
+      "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
+      "2022-08-01 18:43:23.877821: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 16249 MB memory:  -> device: 0, name: Quadro GV100, pci bus id: 0000:2d:00.0, compute capability: 7.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "import cudf\n",
+    "import nvtabular as nvt\n",
+    "\n",
+    "from nvtabular.ops import *\n",
+    "from merlin.schema.tags import Tags\n",
+    "\n",
+    "import merlin.models.tf as mm\n",
+    "from merlin.io.dataset import Dataset\n",
+    "from merlin.datasets.ecommerce import transform_aliccp\n",
+    "import tensorflow as tf\n",
+    "\n",
+    "# for running this example on CPU, comment out the line below\n",
+    "os.environ[\"TF_GPU_ALLOCATOR\"] = \"cuda_malloc_async\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "028a1398-76a8-4998-97d8-34a806e130d3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# disable INFO and DEBUG logging everywhere\n",
+    "import logging\n",
+    "\n",
+    "logging.disable(logging.WARNING)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "baad8ae3",
+   "metadata": {},
+   "source": [
+    "In this example notebook, we will generate the synthetic train and test datasets mimicking the real [Ali-CCP: Alibaba Click and Conversion Prediction](https://tianchi.aliyun.com/dataset/dataDetail?dataId=408#1) dataset to build our recommender system models.\n",
+    "\n",
+    "First, we define our input path and feature repo path."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "81ddb370",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "DATA_FOLDER = os.environ.get(\"DATA_FOLDER\", \"/workspace/data/\")\n",
+    "# set up the base dir for feature store\n",
+    "BASE_DIR = os.environ.get(\n",
+    "    \"BASE_DIR\", \"/Merlin/examples/Building-and-deploying-multi-stage-RecSys/\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a746a3f-1845-4af3-8a37-1b34aa1bb81b",
+   "metadata": {},
+   "source": [
+    "Then, we use `generate_data` utility function to generate synthetic dataset. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "b44b3378-7297-4946-a271-742a9239bc3e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from merlin.datasets.synthetic import generate_data\n",
+    "\n",
+    "NUM_ROWS = os.environ.get(\"NUM_ROWS\", 10000)\n",
+    "train_raw, valid_raw = generate_data(\"aliccp-raw\", int(NUM_ROWS), set_sizes=(0.7, 0.3))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "16bae6de-0345-4963-8f73-3ae234e54040",
+   "metadata": {},
+   "source": [
+    "If you would like to use the real ALI-CCP dataset, you can use [get_aliccp()](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/datasets/ecommerce/aliccp/dataset.py) function instead. This function takes the raw csv files, and generate parquet files that can be directly fed to NVTabular workflow above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aba2a130-b50f-401b-a73e-2dcf0006b31f",
+   "metadata": {},
+   "source": [
+    "Encode item_id column"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "1071aed6-a87d-4f7f-a36e-49a8c7e2c48a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "output_path = os.path.join(DATA_FOLDER, \"processed\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "2ad41231-4a2e-42d1-9f2d-6d6ce69de820",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cols = train_raw.to_ddf().columns"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "1f21685f-6f72-4427-ad36-e68993d2b8a0",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Index(['user_id', 'user_shops', 'user_profile', 'user_group', 'user_gender',\n",
+       "       'user_age', 'user_consumption_1', 'user_consumption_2',\n",
+       "       'user_is_occupied', 'user_geography', 'user_intentions', 'user_brands',\n",
+       "       'user_categories', 'item_id', 'item_category', 'item_shop',\n",
+       "       'item_brand', 'item_intention', 'user_item_categories',\n",
+       "       'user_item_shops', 'user_item_brands', 'user_item_intentions',\n",
+       "       'position', 'click', 'conversion'],\n",
+       "      dtype='object')"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "cols"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "36340fc3-a10e-47d3-b6b4-d951d2066871",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.\n",
+      "  warnings.warn(\n"
+     ]
+    }
+   ],
+   "source": [
+    "item_id = [\"item_id\"] >> Categorify(dtype=\"int32\", out_path='./categories_processed') >> TagAsItemID()\n",
+    "user_id_raw = [\"user_id\"] >> Rename(postfix='_raw') >> TagAsUserFeatures()\n",
+    "item_id_raw = [\"item_id\"] >> Rename(postfix='_raw') >> TagAsItemFeatures()\n",
+    "\n",
+    "outputs = item_id + user_id_raw + item_id_raw + list(cols)\n",
+    "workflow = nvt.Workflow(outputs)\n",
+    "\n",
+    "workflow.fit(train_raw)\n",
+    "\n",
+    "workflow.transform(train_raw).to_parquet(output_path=os.path.join(output_path, \"train\"))\n",
+    "\n",
+    "workflow.transform(valid_raw).to_parquet(output_path=os.path.join(output_path, \"valid\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4dd2264c-8b4d-49fd-b49e-79a037e89c7b",
+   "metadata": {},
+   "source": [
+    "Read processed parquet files to add previously interacted features as a column to our raw datasets, so that we can use it at the filtering stage."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "fec5a7b9-7a39-4500-b52b-b613f8a3faa3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_raw = cudf.read_parquet(os.path.join(output_path, \"train\", \"*.parquet\"))\n",
+    "valid_raw = cudf.read_parquet(os.path.join(output_path, \"valid\", \"*.parquet\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "eee928e9-82d8-4045-877e-127261451356",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_raw_df_gr = train_raw.groupby('user_id')['item_id'].agg(list).reset_index()\n",
+    "train_raw_df_gr = train_raw_df_gr.rename(columns={\"item_id\": \"item_id_seen\"})\n",
+    "train_raw = train_raw.merge(train_raw_df_gr, on=['user_id'], how='left')\n",
+    "valid_raw = valid_raw.merge(train_raw_df_gr, on=['user_id'], how='left')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "b33c14ea-b0c3-4e1a-b3d8-ce131c7913b0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "add_tags = nvt.ColumnSelector(['item_id_seen']) >> TagAsUserFeatures()\n",
+    "workflow = nvt.Workflow(add_tags + list(cols) + ['item_id_raw', 'user_id_raw'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "ccc38eed-29d0-42f0-91e2-4ce4b3dfe082",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_dataset = workflow.fit_transform(Dataset(train_raw))\n",
+    "valid_dataset = workflow.transform(Dataset(valid_raw))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "a2d37e3e-70ed-4299-ba9c-3ac13261f703",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_dataset_orig = Dataset(os.path.join(output_path, \"train\", \"*.parquet\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "9f7ec3cb-128c-467f-9b71-b265c9ee832d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_dataset.schema = train_dataset_orig.schema + workflow.output_schema\n",
+    "valid_dataset.schema = train_dataset_orig.schema + workflow.output_schema"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "b9e01923-57b8-4b31-89d3-786f138b6d70",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>name</th>\n",
+       "      <th>tags</th>\n",
+       "      <th>dtype</th>\n",
+       "      <th>is_list</th>\n",
+       "      <th>is_ragged</th>\n",
+       "      <th>properties.num_buckets</th>\n",
+       "      <th>properties.freq_threshold</th>\n",
+       "      <th>properties.max_size</th>\n",
+       "      <th>properties.start_index</th>\n",
+       "      <th>properties.cat_path</th>\n",
+       "      <th>properties.embedding_sizes.cardinality</th>\n",
+       "      <th>properties.embedding_sizes.dimension</th>\n",
+       "      <th>properties.domain.min</th>\n",
+       "      <th>properties.domain.max</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>item_id_seen</td>\n",
+       "      <td>(Tags.USER)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>True</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>user_id</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL, Tags.USER_ID)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_id.parquet</td>\n",
+       "      <td>294736.0</td>\n",
+       "      <td>512.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>294736.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>user_shops</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_shops.parquet</td>\n",
+       "      <td>116741.0</td>\n",
+       "      <td>512.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>116741.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>user_profile</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_profile.parquet</td>\n",
+       "      <td>98.0</td>\n",
+       "      <td>21.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>98.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>user_group</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_group.parquet</td>\n",
+       "      <td>14.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>14.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>user_gender</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_gender.parquet</td>\n",
+       "      <td>3.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>3.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>user_age</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_age.parquet</td>\n",
+       "      <td>8.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>8.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>user_consumption_1</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_consumption_1.parquet</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>4.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>user_consumption_2</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_consumption_2.parquet</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>4.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>user_is_occupied</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_is_occupied.parquet</td>\n",
+       "      <td>3.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>3.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>user_geography</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_geography.parquet</td>\n",
+       "      <td>5.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>5.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11</th>\n",
+       "      <td>user_intentions</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_intentions.parquet</td>\n",
+       "      <td>33786.0</td>\n",
+       "      <td>512.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>33786.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>user_brands</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_brands.parquet</td>\n",
+       "      <td>58015.0</td>\n",
+       "      <td>512.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>58015.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13</th>\n",
+       "      <td>user_categories</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_categories.parquet</td>\n",
+       "      <td>6086.0</td>\n",
+       "      <td>211.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>6086.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>item_id</td>\n",
+       "      <td>(Tags.CATEGORICAL, Tags.ITEM, Tags.ITEM_ID)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_processed/categories/unique.item_...</td>\n",
+       "      <td>240.0</td>\n",
+       "      <td>34.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>240.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>15</th>\n",
+       "      <td>item_category</td>\n",
+       "      <td>(Tags.ITEM, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.item_category.parquet</td>\n",
+       "      <td>8581.0</td>\n",
+       "      <td>255.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>8581.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>item_shop</td>\n",
+       "      <td>(Tags.ITEM, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.item_shop.parquet</td>\n",
+       "      <td>604498.0</td>\n",
+       "      <td>512.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>604498.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>17</th>\n",
+       "      <td>item_brand</td>\n",
+       "      <td>(Tags.ITEM, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.item_brand.parquet</td>\n",
+       "      <td>208179.0</td>\n",
+       "      <td>512.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>208179.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>18</th>\n",
+       "      <td>item_intention</td>\n",
+       "      <td>(Tags.ITEM, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.item_intention.parquet</td>\n",
+       "      <td>96258.0</td>\n",
+       "      <td>512.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>96258.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19</th>\n",
+       "      <td>user_item_categories</td>\n",
+       "      <td>(Tags.CATEGORICAL, user_item)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_item_categories.parquet</td>\n",
+       "      <td>7735.0</td>\n",
+       "      <td>241.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>7735.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>20</th>\n",
+       "      <td>user_item_shops</td>\n",
+       "      <td>(Tags.CATEGORICAL, user_item)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_item_shops.parquet</td>\n",
+       "      <td>384343.0</td>\n",
+       "      <td>512.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>384343.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>21</th>\n",
+       "      <td>user_item_brands</td>\n",
+       "      <td>(Tags.CATEGORICAL, user_item)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_item_brands.parquet</td>\n",
+       "      <td>142632.0</td>\n",
+       "      <td>512.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>142632.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>22</th>\n",
+       "      <td>user_item_intentions</td>\n",
+       "      <td>(Tags.CATEGORICAL, user_item)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_item_intentions.parquet</td>\n",
+       "      <td>74317.0</td>\n",
+       "      <td>512.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>74317.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>23</th>\n",
+       "      <td>position</td>\n",
+       "      <td>(Tags.CATEGORICAL, Tags.CONTEXT)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.position.parquet</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>4.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>24</th>\n",
+       "      <td>click</td>\n",
+       "      <td>()</td>\n",
+       "      <td>int64</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>2.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>25</th>\n",
+       "      <td>conversion</td>\n",
+       "      <td>()</td>\n",
+       "      <td>int64</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>2.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>26</th>\n",
+       "      <td>item_id_raw</td>\n",
+       "      <td>(Tags.CATEGORICAL, Tags.ITEM, Tags.ITEM_ID)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.item_id.parquet</td>\n",
+       "      <td>3078306.0</td>\n",
+       "      <td>512.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>3078306.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>27</th>\n",
+       "      <td>user_id_raw</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL, Tags.USER_ID)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>.//categories/unique.user_id.parquet</td>\n",
+       "      <td>294736.0</td>\n",
+       "      <td>512.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>294736.0</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "[{'name': 'item_id_seen', 'tags': {<Tags.USER: 'user'>}, 'properties': {}, 'dtype': dtype('int32'), 'is_list': True, 'is_ragged': True}, {'name': 'user_id', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>, <Tags.USER_ID: 'user_id'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_id.parquet', 'embedding_sizes': {'cardinality': 294736.0, 'dimension': 512.0}, 'domain': {'min': 0, 'max': 294736}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_shops', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_shops.parquet', 'embedding_sizes': {'cardinality': 116741.0, 'dimension': 512.0}, 'domain': {'min': 0, 'max': 116741}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_profile', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_profile.parquet', 'embedding_sizes': {'cardinality': 98.0, 'dimension': 21.0}, 'domain': {'min': 0, 'max': 98}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_group', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_group.parquet', 'embedding_sizes': {'cardinality': 14.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 14}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_gender', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_gender.parquet', 'embedding_sizes': {'cardinality': 3.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 3}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_age', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_age.parquet', 'embedding_sizes': {'cardinality': 8.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 8}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_consumption_1', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_consumption_1.parquet', 'embedding_sizes': {'cardinality': 4.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 4}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_consumption_2', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_consumption_2.parquet', 'embedding_sizes': {'cardinality': 4.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 4}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_is_occupied', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_is_occupied.parquet', 'embedding_sizes': {'cardinality': 3.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 3}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_geography', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_geography.parquet', 'embedding_sizes': {'cardinality': 5.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 5}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_intentions', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_intentions.parquet', 'embedding_sizes': {'cardinality': 33786.0, 'dimension': 512.0}, 'domain': {'min': 0, 'max': 33786}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_brands', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_brands.parquet', 'embedding_sizes': {'cardinality': 58015.0, 'dimension': 512.0}, 'domain': {'min': 0, 'max': 58015}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_categories', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_categories.parquet', 'embedding_sizes': {'cardinality': 6086.0, 'dimension': 211.0}, 'domain': {'min': 0, 'max': 6086}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'item_id', 'tags': {<Tags.CATEGORICAL: 'categorical'>, <Tags.ITEM: 'item'>, <Tags.ITEM_ID: 'item_id'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_processed/categories/unique.item_id.parquet', 'embedding_sizes': {'cardinality': 240.0, 'dimension': 34.0}, 'domain': {'min': 0, 'max': 240}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'item_category', 'tags': {<Tags.ITEM: 'item'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.item_category.parquet', 'embedding_sizes': {'cardinality': 8581.0, 'dimension': 255.0}, 'domain': {'min': 0, 'max': 8581}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'item_shop', 'tags': {<Tags.ITEM: 'item'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.item_shop.parquet', 'embedding_sizes': {'cardinality': 604498.0, 'dimension': 512.0}, 'domain': {'min': 0, 'max': 604498}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'item_brand', 'tags': {<Tags.ITEM: 'item'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.item_brand.parquet', 'embedding_sizes': {'cardinality': 208179.0, 'dimension': 512.0}, 'domain': {'min': 0, 'max': 208179}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'item_intention', 'tags': {<Tags.ITEM: 'item'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.item_intention.parquet', 'embedding_sizes': {'cardinality': 96258.0, 'dimension': 512.0}, 'domain': {'min': 0, 'max': 96258}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_item_categories', 'tags': {<Tags.CATEGORICAL: 'categorical'>, 'user_item'}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_item_categories.parquet', 'embedding_sizes': {'cardinality': 7735.0, 'dimension': 241.0}, 'domain': {'min': 0, 'max': 7735}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_item_shops', 'tags': {<Tags.CATEGORICAL: 'categorical'>, 'user_item'}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_item_shops.parquet', 'embedding_sizes': {'cardinality': 384343.0, 'dimension': 512.0}, 'domain': {'min': 0, 'max': 384343}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_item_brands', 'tags': {<Tags.CATEGORICAL: 'categorical'>, 'user_item'}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_item_brands.parquet', 'embedding_sizes': {'cardinality': 142632.0, 'dimension': 512.0}, 'domain': {'min': 0, 'max': 142632}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_item_intentions', 'tags': {<Tags.CATEGORICAL: 'categorical'>, 'user_item'}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_item_intentions.parquet', 'embedding_sizes': {'cardinality': 74317.0, 'dimension': 512.0}, 'domain': {'min': 0, 'max': 74317}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'position', 'tags': {<Tags.CATEGORICAL: 'categorical'>, <Tags.CONTEXT: 'context'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.position.parquet', 'embedding_sizes': {'cardinality': 4.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 4}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'click', 'tags': set(), 'properties': {'domain': {'min': 0, 'max': 2}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'conversion', 'tags': set(), 'properties': {'domain': {'min': 0, 'max': 2}}, 'dtype': dtype('int64'), 'is_list': False, 'is_ragged': False}, {'name': 'item_id_raw', 'tags': {<Tags.CATEGORICAL: 'categorical'>, <Tags.ITEM: 'item'>, <Tags.ITEM_ID: 'item_id'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.item_id.parquet', 'embedding_sizes': {'cardinality': 3078306.0, 'dimension': 512.0}, 'domain': {'min': 0, 'max': 3078306}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_id_raw', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>, <Tags.USER_ID: 'user_id'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './/categories/unique.user_id.parquet', 'embedding_sizes': {'cardinality': 294736.0, 'dimension': 512.0}, 'domain': {'min': 0, 'max': 294736}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}]"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "train_dataset.schema"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "028948bc-7e89-453e-896c-3dcacd42e858",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<dask_cudf.Series | 4 tasks | 1 npartitions>"
+      ]
+     },
+     "execution_count": 18,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "train_dataset.to_ddf()['item_id_raw']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09c87748-af61-42b8-8574-1afe3d71118f",
+   "metadata": {},
+   "source": [
+    "### Training a Retrieval Model with Two-Tower Model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e644fcba-7b0b-44c0-97fd-80f4fcb01191",
+   "metadata": {},
+   "source": [
+    "We start with the offline candidate retrieval stage. We are going to train a Two-Tower model for item retrieval. To learn more about the Two-tower model you can visit [05-Retrieval-Model.ipynb](https://github.com/NVIDIA-Merlin/models/blob/main/examples/05-Retrieval-Model.ipynb)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf9bca46-a6b6-4a73-afd8-fe2869c60748",
+   "metadata": {},
+   "source": [
+    "#### Feature Engineering with NVTabular"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da2b09cc-09fb-4814-a1cb-7e6168d9eb4b",
+   "metadata": {},
+   "source": [
+    "We are going to process our raw categorical features by encoding them using `Categorify()` operator and tag the features with `user` or `item` tags in the schema file. To learn more about [NVTabular](https://github.com/NVIDIA-Merlin/NVTabular) and the schema object visit this example [notebook](https://github.com/NVIDIA-Merlin/models/blob/main/examples/02-Merlin-Models-and-NVTabular-integration.ipynb) in the Merlin Models repo."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "df72a793-194b-44f4-80c3-aaa368a9a01e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "output_path = os.path.join(DATA_FOLDER, \"processed/retrieval\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ffd7e2ac-a251-49d0-943b-e9272c852ba6",
+   "metadata": {},
+   "source": [
+    "We select only positive interaction rows where `click==1` in the dataset with `Filter()` operator."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "7e085a6d-74ad-4c24-8e7c-4e449c15f471",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 138 µs, sys: 40 µs, total: 178 µs\n",
+      "Wall time: 184 µs\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "\n",
+    "user_id = [\"user_id\"] >> Categorify(dtype=\"int32\", out_path='./categories_tt') >> TagAsUserID()\n",
+    "\n",
+    "item_features = (\n",
+    "    [\"item_category\", \"item_shop\", \"item_brand\"] >> Categorify(dtype=\"int32\", out_path='./categories_tt') >> TagAsItemFeatures()\n",
+    ")\n",
+    "\n",
+    "user_features = (\n",
+    "    [\n",
+    "        \"user_shops\",\n",
+    "        \"user_profile\",\n",
+    "        \"user_group\",\n",
+    "        \"user_gender\",\n",
+    "        \"user_age\",\n",
+    "        \"user_consumption_2\",\n",
+    "        \"user_is_occupied\",\n",
+    "        \"user_geography\",\n",
+    "        \"user_intentions\",\n",
+    "        \"user_brands\",\n",
+    "        \"user_categories\",\n",
+    "    ] >> Categorify(dtype=\"int32\", out_path='./categories_tt') >> TagAsUserFeatures()\n",
+    ")\n",
+    "\n",
+    "inputs = user_id + item_features + user_features +  ['item_id', 'item_id_seen', 'user_id_raw', 'item_id_raw', \"click\"]\n",
+    "\n",
+    "outputs = inputs >> Filter(f=lambda df: df[\"click\"] == 1)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba4b8db6-2c51-47ce-8f0c-fad7782911a8",
+   "metadata": {},
+   "source": [
+    "Let's call `transform_aliccp` utility function to be able to perform `fit` and `transform` steps on the raw dataset applying the operators defined in the NVTabular workflow pipeline below, and also save our workflow model. After fit and transform, the processed parquet files are saved to output_path."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "55b4844c-673d-4e3b-9d71-5c14593df763",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "transform_aliccp(\n",
+    "    (train_dataset, valid_dataset),\n",
+    "    output_path,\n",
+    "    nvt_workflow=outputs,\n",
+    "    workflow_name=\"workflow_retrieval\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc4721ae-7228-4d3f-9586-dcdfefecc19f",
+   "metadata": {},
+   "source": [
+    "NVTabular exported the schema file, `schema.pbtxt` a protobuf text file, of our processed dataset. To learn more about the schema object and schema file you can explore [02-Merlin-Models-and-NVTabular-integration.ipynb](https://github.com/NVIDIA-Merlin/models/blob/main/examples/02-Merlin-Models-and-NVTabular-integration.ipynb) notebook."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "71063653-2f39-4b54-8399-145d6f281d4d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_tt = Dataset(os.path.join(output_path, \"train\", \"*.parquet\"))\n",
+    "valid_tt = Dataset(os.path.join(output_path, \"valid\", \"*.parquet\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "7ef26d86-712d-4651-ba6c-8e0cac042d00",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "schema = train_tt.schema\n",
+    "schema = schema.select_by_tag([Tags.ITEM_ID, Tags.USER_ID, Tags.ITEM, Tags.USER]).without(['user_id_raw', 'item_id_raw', 'item_id_seen'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "9baa0946-7723-4943-80f5-40c9c0e6acfd",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>name</th>\n",
+       "      <th>tags</th>\n",
+       "      <th>dtype</th>\n",
+       "      <th>is_list</th>\n",
+       "      <th>is_ragged</th>\n",
+       "      <th>properties.num_buckets</th>\n",
+       "      <th>properties.freq_threshold</th>\n",
+       "      <th>properties.max_size</th>\n",
+       "      <th>properties.start_index</th>\n",
+       "      <th>properties.cat_path</th>\n",
+       "      <th>properties.embedding_sizes.cardinality</th>\n",
+       "      <th>properties.embedding_sizes.dimension</th>\n",
+       "      <th>properties.domain.min</th>\n",
+       "      <th>properties.domain.max</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>user_id</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL, Tags.USER_ID)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.user_id.parquet</td>\n",
+       "      <td>256.0</td>\n",
+       "      <td>36.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>256</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>item_category</td>\n",
+       "      <td>(Tags.ITEM, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.item_categor...</td>\n",
+       "      <td>240.0</td>\n",
+       "      <td>34.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>240</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>item_shop</td>\n",
+       "      <td>(Tags.ITEM, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.item_shop.pa...</td>\n",
+       "      <td>240.0</td>\n",
+       "      <td>34.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>240</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>item_brand</td>\n",
+       "      <td>(Tags.ITEM, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.item_brand.p...</td>\n",
+       "      <td>240.0</td>\n",
+       "      <td>34.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>240</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>user_shops</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.user_shops.p...</td>\n",
+       "      <td>256.0</td>\n",
+       "      <td>36.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>256</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>user_profile</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.user_profile...</td>\n",
+       "      <td>38.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>38</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>user_group</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.user_group.p...</td>\n",
+       "      <td>7.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>7</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>user_gender</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.user_gender....</td>\n",
+       "      <td>2.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>user_age</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.user_age.par...</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>4</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>user_consumption_2</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.user_consump...</td>\n",
+       "      <td>3.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>user_is_occupied</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.user_is_occu...</td>\n",
+       "      <td>2.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11</th>\n",
+       "      <td>user_geography</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.user_geograp...</td>\n",
+       "      <td>3.0</td>\n",
+       "      <td>16.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>user_intentions</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.user_intenti...</td>\n",
+       "      <td>256.0</td>\n",
+       "      <td>36.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>256</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13</th>\n",
+       "      <td>user_brands</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.user_brands....</td>\n",
+       "      <td>256.0</td>\n",
+       "      <td>36.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>256</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>user_categories</td>\n",
+       "      <td>(Tags.USER, Tags.CATEGORICAL)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_tt/categories/unique.user_categor...</td>\n",
+       "      <td>256.0</td>\n",
+       "      <td>36.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>256</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>15</th>\n",
+       "      <td>item_id</td>\n",
+       "      <td>(Tags.CATEGORICAL, Tags.ITEM, Tags.ITEM_ID)</td>\n",
+       "      <td>int32</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>None</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>./categories_processed/categories/unique.item_...</td>\n",
+       "      <td>240.0</td>\n",
+       "      <td>34.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>240</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "[{'name': 'user_id', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>, <Tags.USER_ID: 'user_id'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.user_id.parquet', 'embedding_sizes': {'cardinality': 256.0, 'dimension': 36.0}, 'domain': {'min': 0, 'max': 256}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'item_category', 'tags': {<Tags.ITEM: 'item'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.item_category.parquet', 'embedding_sizes': {'cardinality': 240.0, 'dimension': 34.0}, 'domain': {'min': 0, 'max': 240}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'item_shop', 'tags': {<Tags.ITEM: 'item'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.item_shop.parquet', 'embedding_sizes': {'cardinality': 240.0, 'dimension': 34.0}, 'domain': {'min': 0, 'max': 240}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'item_brand', 'tags': {<Tags.ITEM: 'item'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.item_brand.parquet', 'embedding_sizes': {'cardinality': 240.0, 'dimension': 34.0}, 'domain': {'min': 0, 'max': 240}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_shops', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.user_shops.parquet', 'embedding_sizes': {'cardinality': 256.0, 'dimension': 36.0}, 'domain': {'min': 0, 'max': 256}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_profile', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.user_profile.parquet', 'embedding_sizes': {'cardinality': 38.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 38}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_group', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.user_group.parquet', 'embedding_sizes': {'cardinality': 7.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 7}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_gender', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.user_gender.parquet', 'embedding_sizes': {'cardinality': 2.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 2}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_age', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.user_age.parquet', 'embedding_sizes': {'cardinality': 4.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 4}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_consumption_2', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.user_consumption_2.parquet', 'embedding_sizes': {'cardinality': 3.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 3}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_is_occupied', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.user_is_occupied.parquet', 'embedding_sizes': {'cardinality': 2.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 2}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_geography', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.user_geography.parquet', 'embedding_sizes': {'cardinality': 3.0, 'dimension': 16.0}, 'domain': {'min': 0, 'max': 3}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_intentions', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.user_intentions.parquet', 'embedding_sizes': {'cardinality': 256.0, 'dimension': 36.0}, 'domain': {'min': 0, 'max': 256}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_brands', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.user_brands.parquet', 'embedding_sizes': {'cardinality': 256.0, 'dimension': 36.0}, 'domain': {'min': 0, 'max': 256}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'user_categories', 'tags': {<Tags.USER: 'user'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_tt/categories/unique.user_categories.parquet', 'embedding_sizes': {'cardinality': 256.0, 'dimension': 36.0}, 'domain': {'min': 0, 'max': 256}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}, {'name': 'item_id', 'tags': {<Tags.CATEGORICAL: 'categorical'>, <Tags.ITEM: 'item'>, <Tags.ITEM_ID: 'item_id'>}, 'properties': {'num_buckets': None, 'freq_threshold': 0.0, 'max_size': 0.0, 'start_index': 0.0, 'cat_path': './categories_processed/categories/unique.item_id.parquet', 'embedding_sizes': {'cardinality': 240.0, 'dimension': 34.0}, 'domain': {'min': 0, 'max': 240}}, 'dtype': dtype('int32'), 'is_list': False, 'is_ragged': False}]"
+      ]
+     },
+     "execution_count": 24,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "schema"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "9312511a-f368-42f2-93d2-eb95aebbf46c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2022-08-01 18:43:27.197904: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.\n"
+     ]
+    }
+   ],
+   "source": [
+    "model_tt = mm.TwoTowerModel(\n",
+    "    schema,\n",
+    "    query_tower=mm.MLPBlock([128, 64], no_activation_last_layer=True),\n",
+    "    samplers=[mm.InBatchSampler()],\n",
+    "    embedding_options=mm.EmbeddingOptions(infer_embedding_sizes=True),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "4d47cb8b-e06a-4932-9a19-fb244ef43152",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1/1 [==============================] - 9s 9s/step - loss: 8.1273 - recall_at_10: 0.0151 - ndcg_at_10: 0.0087 - regularization_loss: 0.0000e+00 - val_loss: 7.3207 - val_recall_at_10: 0.0221 - val_ndcg_at_10: 0.0136 - val_regularization_loss: 0.0000e+00\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<keras.callbacks.History at 0x7f87ffd4a280>"
+      ]
+     },
+     "execution_count": 26,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "model_tt.compile(\n",
+    "    optimizer=\"adam\",\n",
+    "    run_eagerly=False,\n",
+    "    loss=\"categorical_crossentropy\",\n",
+    "    metrics=[mm.RecallAt(10), mm.NDCGAt(10)],\n",
+    ")\n",
+    "model_tt.fit(train_tt, validation_data=valid_tt, batch_size=1024 * 8, epochs=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "80d83007-f9e8-408f-9f65-a0e9e19cb586",
+   "metadata": {},
+   "source": [
+    "### Exporting query (user) model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22af58a9-5525-454a-bf25-a9df0462aa53",
+   "metadata": {},
+   "source": [
+    "We export the query tower to use it later during the model deployment stage with Merlin Systems."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "d2370f13-ff9a-4ee0-ba1e-451c7bec0f8a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query_tower = model_tt.retrieval_block.query_block()\n",
+    "query_tower.save(os.path.join(BASE_DIR, \"query_tower\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e16401d4",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "### Training a Ranking Model with DLRM"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b72e8a2a-fc4a-43ab-934c-6d941c56aad2",
+   "metadata": {},
+   "source": [
+    "Now we will move onto training an offline ranking model. This ranking model will be used for scoring our retrieved items."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5243f652-141f-4151-b05a-6d36396e719f",
+   "metadata": {},
+   "source": [
+    "#### Feature Engineering with NVTabular"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ccc14bbf-b813-4306-a9a5-ccb1ccc56b5e",
+   "metadata": {},
+   "source": [
+    "Define output path."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "6a4b2ad0-c873-4a4a-8466-d21b5d181c74",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "output_path = os.path.join(DATA_FOLDER, \"processed/ranking\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "7a6bc984-f1b7-4e2f-97ca-612be0d8e390",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 1.01 s, sys: 67.8 ms, total: 1.07 s\n",
+      "Wall time: 1.08 s\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "\n",
+    "user_id = [\"user_id\"] >> Categorify(dtype=\"int32\") >> TagAsUserID()\n",
+    "\n",
+    "item_features = (\n",
+    "    [\"item_category\", \"item_shop\", \"item_brand\"] >> Categorify(dtype=\"int32\") >> TagAsItemFeatures()\n",
+    ")\n",
+    "\n",
+    "user_features = (\n",
+    "    [\n",
+    "        \"user_shops\",\n",
+    "        \"user_profile\",\n",
+    "        \"user_group\",\n",
+    "        \"user_gender\",\n",
+    "        \"user_age\",\n",
+    "        \"user_consumption_2\",\n",
+    "        \"user_is_occupied\",\n",
+    "        \"user_geography\",\n",
+    "        \"user_intentions\",\n",
+    "        \"user_brands\",\n",
+    "        \"user_categories\",\n",
+    "    ] >> Categorify(dtype=\"int32\") >> TagAsUserFeatures()\n",
+    ")\n",
+    "\n",
+    "targets = [\"click\"] >> AddMetadata(tags=[Tags.BINARY_CLASSIFICATION, \"target\"])\n",
+    "\n",
+    "outputs = user_id + item_features + user_features + ['item_id', 'item_id_seen', 'user_id_raw', 'item_id_raw'] + targets\n",
+    "\n",
+    "\n",
+    "transform_aliccp(\n",
+    "    (train_dataset, valid_dataset), output_path, nvt_workflow=outputs, workflow_name=\"workflow_ranking\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4f2b234",
+   "metadata": {},
+   "source": [
+    "We use the `schema` object to define our model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "cb870461-6ac2-49b2-ba6a-2da6ecb57f1d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# define train and valid dataset objects\n",
+    "train = Dataset(os.path.join(output_path, \"train\", \"*.parquet\"), part_size=\"500MB\")\n",
+    "valid = Dataset(os.path.join(output_path, \"valid\", \"*.parquet\"), part_size=\"500MB\")\n",
+    "\n",
+    "# define schema object\n",
+    "schema = train.schema.without(['user_id_raw', 'item_id_raw', 'item_id_seen'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "30e4ebc2",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'click'"
+      ]
+     },
+     "execution_count": 31,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "target_column = schema.select_by_tag(Tags.TARGET).column_names[0]\n",
+    "target_column"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f68e26b",
+   "metadata": {},
+   "source": [
+    "Deep Learning Recommendation Model [(DLRM)](https://arxiv.org/abs/1906.00091) architecture is a popular neural network model originally proposed by Facebook in 2019. The model was introduced as a personalization deep learning model that uses embeddings to process sparse features that represent categorical data and a multilayer perceptron (MLP) to process dense features, then interacts these features explicitly using the statistical techniques proposed in [here](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5694074). To learn more about DLRM architetcture please visit `Exploring-different-models` [notebook](https://github.com/NVIDIA-Merlin/models/blob/main/examples/04-Exporting-ranking-models.ipynb) in the Merlin Models GH repo."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "e4325080",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = mm.DLRMModel(\n",
+    "    schema,\n",
+    "    embedding_dim=64,\n",
+    "    bottom_block=mm.MLPBlock([128, 64]),\n",
+    "    top_block=mm.MLPBlock([128, 64, 32]),\n",
+    "    prediction_tasks=mm.BinaryClassificationTask(target_column),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "bfe2aa9e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1/1 [==============================] - 4s 4s/step - loss: 0.6932 - auc: 0.5002 - regularization_loss: 0.0000e+00 - val_loss: 0.6935 - val_auc: 0.4930 - val_regularization_loss: 0.0000e+00\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<keras.callbacks.History at 0x7f87fccda760>"
+      ]
+     },
+     "execution_count": 33,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "model.compile(optimizer=\"adam\", run_eagerly=False, metrics=[tf.keras.metrics.AUC()])\n",
+    "model.fit(train, validation_data=valid, batch_size=16 * 1024)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "498c4d49-7a59-4260-87b9-b86b66f2c67f",
+   "metadata": {},
+   "source": [
+    "Let's save our DLRM model to be able to load back at the deployment stage. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "00447c12-ea80-4d98-ab47-cc1a982a6958",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model.save(os.path.join(BASE_DIR, \"dlrm\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d64a3f3f-81d8-489c-835f-c62f76df22d5",
+   "metadata": {},
+   "source": [
+    "In the following cells we are going to export the required user and item features files, and save the query (user) tower model and item embeddings to disk. If you want to read more about exporting retrieval models, please visit [05-Retrieval-Model.ipynb](https://github.com/NVIDIA-Merlin/models/blob/main/examples/05-Retrieval-Model.ipynb) notebook in Merlin Models library repo."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5da1f434-f5a1-4478-b588-7e7ec17e6a88",
+   "metadata": {},
+   "source": [
+    "### Set up a feature store with Feast"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99a4e939-d3cf-44f0-9012-d2af3264ee25",
+   "metadata": {},
+   "source": [
+    "Before we move onto the next step, we need to create a Feast feature repository. We will create the feature repo in the current working directory, which is `BASE_DIR` for us."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "2e7e96d2-9cd2-40d1-b356-8cd76b57bb4a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Creating a new Feast repository in \u001b[1m\u001b[32m/Merlin/examples/Building-and-deploying-multi-stage-RecSys/feature_repo\u001b[0m.\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "!rm -rf $BASE_DIR/feature_repo\n",
+    "!cd $BASE_DIR && feast init feature_repo"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e630e53-8336-487a-9ceb-133b1538acfb",
+   "metadata": {},
+   "source": [
+    "You should be seeing a message like <i>Creating a new Feast repository in ... </i> printed out above. Now, navigate to the `feature_repo` folder and remove the demo parquet file created by default, and `examples.py` file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "26ba2521-ed1b-4c2b-afdd-26b4a5a9c008",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "feature_repo_path = os.path.join(BASE_DIR, \"feature_repo\")\n",
+    "if os.path.exists(f\"{feature_repo_path}/example.py\"):\n",
+    "    os.remove(f\"{feature_repo_path}/example.py\")\n",
+    "if os.path.exists(f\"{feature_repo_path}/data/driver_stats.parquet\"):\n",
+    "    os.remove(f\"{feature_repo_path}/data/driver_stats.parquet\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78315676-eb6c-405a-b1fd-3174ea328406",
+   "metadata": {},
+   "source": [
+    "### Exporting user and item features"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "ea0b369c-2f01-42e3-9f3c-74c3ff4a6d64",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from merlin.models.utils.dataset import unique_rows_by_features\n",
+    "\n",
+    "user_features = (\n",
+    "    unique_rows_by_features(train, Tags.USER, Tags.USER_ID)\n",
+    "    .compute()\n",
+    "    .reset_index(drop=True)\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "6b0949f9-e67a-414f-9d74-65f138e820a8",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>user_id</th>\n",
+       "      <th>user_shops</th>\n",
+       "      <th>user_profile</th>\n",
+       "      <th>user_group</th>\n",
+       "      <th>user_gender</th>\n",
+       "      <th>user_age</th>\n",
+       "      <th>user_consumption_2</th>\n",
+       "      <th>user_is_occupied</th>\n",
+       "      <th>user_geography</th>\n",
+       "      <th>user_intentions</th>\n",
+       "      <th>user_brands</th>\n",
+       "      <th>user_categories</th>\n",
+       "      <th>item_id_seen</th>\n",
+       "      <th>user_id_raw</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>[7, 84, 21, 17, 68, 51, 29, 28, 3, 18, 9, 3, 1...</td>\n",
+       "      <td>7</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>[5, 60, 10, 8, 55, 88, 13, 23, 28, 1, 8, 46, 6...</td>\n",
+       "      <td>10</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>[36, 4, 3, 18, 31, 36, 5, 61, 4, 6, 31, 16, 26...</td>\n",
+       "      <td>8</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>[175, 7, 95, 71, 12, 6, 52, 7, 2, 34, 14, 9, 1...</td>\n",
+       "      <td>9</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>[7, 16, 32, 28, 7, 3, 37, 3, 133, 47, 7, 9, 23...</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   user_id  user_shops  user_profile  user_group  user_gender  user_age  \\\n",
+       "0        1           1             1           1            1         1   \n",
+       "1        2           2             1           1            1         1   \n",
+       "2        3           3             1           1            1         1   \n",
+       "3        4           4             1           1            1         1   \n",
+       "4        5           5             1           1            1         1   \n",
+       "\n",
+       "   user_consumption_2  user_is_occupied  user_geography  user_intentions  \\\n",
+       "0                   1                 1               1                1   \n",
+       "1                   1                 1               1                2   \n",
+       "2                   1                 1               1                3   \n",
+       "3                   1                 1               1                4   \n",
+       "4                   1                 1               1                5   \n",
+       "\n",
+       "   user_brands  user_categories  \\\n",
+       "0            1                1   \n",
+       "1            2                2   \n",
+       "2            3                3   \n",
+       "3            4                4   \n",
+       "4            5                5   \n",
+       "\n",
+       "                                        item_id_seen  user_id_raw  \n",
+       "0  [7, 84, 21, 17, 68, 51, 29, 28, 3, 18, 9, 3, 1...            7  \n",
+       "1  [5, 60, 10, 8, 55, 88, 13, 23, 28, 1, 8, 46, 6...           10  \n",
+       "2  [36, 4, 3, 18, 31, 36, 5, 61, 4, 6, 31, 16, 26...            8  \n",
+       "3  [175, 7, 95, 71, 12, 6, 52, 7, 2, 34, 14, 9, 1...            9  \n",
+       "4  [7, 16, 32, 28, 7, 3, 37, 3, 133, 47, 7, 9, 23...            6  "
+      ]
+     },
+     "execution_count": 38,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "user_features.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a46bd8c-1337-4c74-a85b-25348a897d90",
+   "metadata": {},
+   "source": [
+    "We will artificially add `datetime` and `created` timestamp columns to our user_features dataframe. This required by Feast to track the user-item features and their creation time and to determine which version to use when we query Feast."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "id": "d30bd2f8-8a78-4df7-9bc4-42bd741c5b99",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from datetime import datetime\n",
+    "\n",
+    "user_features[\"datetime\"] = datetime.now()\n",
+    "user_features[\"datetime\"] = user_features[\"datetime\"].astype(\"datetime64[ns]\")\n",
+    "user_features[\"created\"] = datetime.now()\n",
+    "user_features[\"created\"] = user_features[\"created\"].astype(\"datetime64[ns]\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "id": "d4998cd1-9dcd-4911-8f23-372e197b41e9",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>user_id</th>\n",
+       "      <th>user_shops</th>\n",
+       "      <th>user_profile</th>\n",
+       "      <th>user_group</th>\n",
+       "      <th>user_gender</th>\n",
+       "      <th>user_age</th>\n",
+       "      <th>user_consumption_2</th>\n",
+       "      <th>user_is_occupied</th>\n",
+       "      <th>user_geography</th>\n",
+       "      <th>user_intentions</th>\n",
+       "      <th>user_brands</th>\n",
+       "      <th>user_categories</th>\n",
+       "      <th>item_id_seen</th>\n",
+       "      <th>user_id_raw</th>\n",
+       "      <th>datetime</th>\n",
+       "      <th>created</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>[7, 84, 21, 17, 68, 51, 29, 28, 3, 18, 9, 3, 1...</td>\n",
+       "      <td>7</td>\n",
+       "      <td>2022-08-01 18:48:08.630208</td>\n",
+       "      <td>2022-08-01 18:48:08.631751</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>[5, 60, 10, 8, 55, 88, 13, 23, 28, 1, 8, 46, 6...</td>\n",
+       "      <td>10</td>\n",
+       "      <td>2022-08-01 18:48:08.630208</td>\n",
+       "      <td>2022-08-01 18:48:08.631751</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>[36, 4, 3, 18, 31, 36, 5, 61, 4, 6, 31, 16, 26...</td>\n",
+       "      <td>8</td>\n",
+       "      <td>2022-08-01 18:48:08.630208</td>\n",
+       "      <td>2022-08-01 18:48:08.631751</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>[175, 7, 95, 71, 12, 6, 52, 7, 2, 34, 14, 9, 1...</td>\n",
+       "      <td>9</td>\n",
+       "      <td>2022-08-01 18:48:08.630208</td>\n",
+       "      <td>2022-08-01 18:48:08.631751</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>[7, 16, 32, 28, 7, 3, 37, 3, 133, 47, 7, 9, 23...</td>\n",
+       "      <td>6</td>\n",
+       "      <td>2022-08-01 18:48:08.630208</td>\n",
+       "      <td>2022-08-01 18:48:08.631751</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   user_id  user_shops  user_profile  user_group  user_gender  user_age  \\\n",
+       "0        1           1             1           1            1         1   \n",
+       "1        2           2             1           1            1         1   \n",
+       "2        3           3             1           1            1         1   \n",
+       "3        4           4             1           1            1         1   \n",
+       "4        5           5             1           1            1         1   \n",
+       "\n",
+       "   user_consumption_2  user_is_occupied  user_geography  user_intentions  \\\n",
+       "0                   1                 1               1                1   \n",
+       "1                   1                 1               1                2   \n",
+       "2                   1                 1               1                3   \n",
+       "3                   1                 1               1                4   \n",
+       "4                   1                 1               1                5   \n",
+       "\n",
+       "   user_brands  user_categories  \\\n",
+       "0            1                1   \n",
+       "1            2                2   \n",
+       "2            3                3   \n",
+       "3            4                4   \n",
+       "4            5                5   \n",
+       "\n",
+       "                                        item_id_seen  user_id_raw  \\\n",
+       "0  [7, 84, 21, 17, 68, 51, 29, 28, 3, 18, 9, 3, 1...            7   \n",
+       "1  [5, 60, 10, 8, 55, 88, 13, 23, 28, 1, 8, 46, 6...           10   \n",
+       "2  [36, 4, 3, 18, 31, 36, 5, 61, 4, 6, 31, 16, 26...            8   \n",
+       "3  [175, 7, 95, 71, 12, 6, 52, 7, 2, 34, 14, 9, 1...            9   \n",
+       "4  [7, 16, 32, 28, 7, 3, 37, 3, 133, 47, 7, 9, 23...            6   \n",
+       "\n",
+       "                    datetime                    created  \n",
+       "0 2022-08-01 18:48:08.630208 2022-08-01 18:48:08.631751  \n",
+       "1 2022-08-01 18:48:08.630208 2022-08-01 18:48:08.631751  \n",
+       "2 2022-08-01 18:48:08.630208 2022-08-01 18:48:08.631751  \n",
+       "3 2022-08-01 18:48:08.630208 2022-08-01 18:48:08.631751  \n",
+       "4 2022-08-01 18:48:08.630208 2022-08-01 18:48:08.631751  "
+      ]
+     },
+     "execution_count": 40,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "user_features.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "id": "32cebf98-e7dc-406b-af4c-18f9ec616b44",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "255"
+      ]
+     },
+     "execution_count": 41,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "user_features['user_id'].max()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "id": "2981b3ed-6156-49f0-aa14-326a3853a58a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "user_features.to_parquet(\n",
+    "    os.path.join(BASE_DIR, \"feature_repo/data\", \"user_features.parquet\")\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "id": "0a33a668-8e2a-4546-8f54-0060d405ba91",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "item_features = (\n",
+    "    unique_rows_by_features(train, Tags.ITEM, Tags.ITEM_ID)\n",
+    "    .compute()\n",
+    "    .reset_index(drop=True)\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "id": "97189581-473c-4928-8be7-ec31b86d69ee",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(239, 5)"
+      ]
+     },
+     "execution_count": 44,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "item_features.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 45,
+   "id": "68a694d6-926f-4b0f-8edc-8cc7ac85ade7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "item_features[\"datetime\"] = datetime.now()\n",
+    "item_features[\"datetime\"] = item_features[\"datetime\"].astype(\"datetime64[ns]\")\n",
+    "item_features[\"created\"] = datetime.now()\n",
+    "item_features[\"created\"] = item_features[\"created\"].astype(\"datetime64[ns]\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 46,
+   "id": "6c03fa22-b112-4243-bbe1-1cd7260cb85b",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>item_category</th>\n",
+       "      <th>item_shop</th>\n",
+       "      <th>item_brand</th>\n",
+       "      <th>item_id</th>\n",
+       "      <th>item_id_raw</th>\n",
+       "      <th>datetime</th>\n",
+       "      <th>created</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>7</td>\n",
+       "      <td>2022-08-01 18:49:30.289331</td>\n",
+       "      <td>2022-08-01 18:49:30.292435</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>8</td>\n",
+       "      <td>2022-08-01 18:49:30.289331</td>\n",
+       "      <td>2022-08-01 18:49:30.292435</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>6</td>\n",
+       "      <td>2022-08-01 18:49:30.289331</td>\n",
+       "      <td>2022-08-01 18:49:30.292435</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>9</td>\n",
+       "      <td>2022-08-01 18:49:30.289331</td>\n",
+       "      <td>2022-08-01 18:49:30.292435</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>10</td>\n",
+       "      <td>2022-08-01 18:49:30.289331</td>\n",
+       "      <td>2022-08-01 18:49:30.292435</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   item_category  item_shop  item_brand  item_id  item_id_raw  \\\n",
+       "0              1          1           1        1            7   \n",
+       "1              2          2           2        2            8   \n",
+       "2              3          3           3        3            6   \n",
+       "3              4          4           4        4            9   \n",
+       "4              5          5           5        5           10   \n",
+       "\n",
+       "                    datetime                    created  \n",
+       "0 2022-08-01 18:49:30.289331 2022-08-01 18:49:30.292435  \n",
+       "1 2022-08-01 18:49:30.289331 2022-08-01 18:49:30.292435  \n",
+       "2 2022-08-01 18:49:30.289331 2022-08-01 18:49:30.292435  \n",
+       "3 2022-08-01 18:49:30.289331 2022-08-01 18:49:30.292435  \n",
+       "4 2022-08-01 18:49:30.289331 2022-08-01 18:49:30.292435  "
+      ]
+     },
+     "execution_count": 46,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "item_features.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 47,
+   "id": "6fbead3a-60c4-483e-a704-2d91179ffcd2",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "239"
+      ]
+     },
+     "execution_count": 47,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "item_features.item_id.max()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 48,
+   "id": "c312884b-a1f8-4e08-8068-696e06a9bf46",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# save to disk\n",
+    "item_features.to_parquet(\n",
+    "    os.path.join(BASE_DIR, \"feature_repo/data\", \"item_features.parquet\")\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff30ceab-b264-4509-9c5b-5a10425e143b",
+   "metadata": {},
+   "source": [
+    "### Extract and save Item embeddings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 49,
+   "id": "00f1fe65-882e-4962-bb16-19a130fda215",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "item_embs = model_tt.item_embeddings(\n",
+    "    Dataset(item_features, schema=schema), batch_size=1024\n",
+    ")\n",
+    "item_embs_df = item_embs.compute(scheduler=\"synchronous\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "id": "cf8b82ea-6cce-4dab-ad17-114b5e7eabd4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# select only item_id together with embedding columns\n",
+    "item_embeddings = item_embs_df.drop(\n",
+    "    columns=[\"item_category\", \"item_shop\", \"item_brand\"]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "id": "e02f0957-6665-400a-80c0-60b307466caf",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>item_id</th>\n",
+       "      <th>0</th>\n",
+       "      <th>1</th>\n",
+       "      <th>2</th>\n",
+       "      <th>3</th>\n",
+       "      <th>4</th>\n",
+       "      <th>5</th>\n",
+       "      <th>6</th>\n",
+       "      <th>7</th>\n",
+       "      <th>8</th>\n",
+       "      <th>...</th>\n",
+       "      <th>54</th>\n",
+       "      <th>55</th>\n",
+       "      <th>56</th>\n",
+       "      <th>57</th>\n",
+       "      <th>58</th>\n",
+       "      <th>59</th>\n",
+       "      <th>60</th>\n",
+       "      <th>61</th>\n",
+       "      <th>62</th>\n",
+       "      <th>63</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>1</td>\n",
+       "      <td>-0.033485</td>\n",
+       "      <td>-0.046890</td>\n",
+       "      <td>-0.031819</td>\n",
+       "      <td>0.030568</td>\n",
+       "      <td>0.009458</td>\n",
+       "      <td>-0.049156</td>\n",
+       "      <td>-0.005019</td>\n",
+       "      <td>0.051071</td>\n",
+       "      <td>0.081326</td>\n",
+       "      <td>...</td>\n",
+       "      <td>-0.014835</td>\n",
+       "      <td>-0.013426</td>\n",
+       "      <td>-0.012331</td>\n",
+       "      <td>0.042307</td>\n",
+       "      <td>-0.024871</td>\n",
+       "      <td>0.000757</td>\n",
+       "      <td>0.032160</td>\n",
+       "      <td>0.014033</td>\n",
+       "      <td>-0.041780</td>\n",
+       "      <td>0.020292</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2</td>\n",
+       "      <td>-0.036122</td>\n",
+       "      <td>0.002882</td>\n",
+       "      <td>-0.031006</td>\n",
+       "      <td>-0.012018</td>\n",
+       "      <td>0.036453</td>\n",
+       "      <td>-0.004707</td>\n",
+       "      <td>0.015386</td>\n",
+       "      <td>0.042837</td>\n",
+       "      <td>0.025634</td>\n",
+       "      <td>...</td>\n",
+       "      <td>-0.033783</td>\n",
+       "      <td>0.000369</td>\n",
+       "      <td>0.027628</td>\n",
+       "      <td>-0.002053</td>\n",
+       "      <td>-0.028099</td>\n",
+       "      <td>-0.015240</td>\n",
+       "      <td>-0.012000</td>\n",
+       "      <td>0.004758</td>\n",
+       "      <td>0.006306</td>\n",
+       "      <td>0.030888</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>3</td>\n",
+       "      <td>-0.024533</td>\n",
+       "      <td>0.013923</td>\n",
+       "      <td>0.000636</td>\n",
+       "      <td>0.003143</td>\n",
+       "      <td>0.053155</td>\n",
+       "      <td>0.035068</td>\n",
+       "      <td>0.003644</td>\n",
+       "      <td>0.022994</td>\n",
+       "      <td>0.021832</td>\n",
+       "      <td>...</td>\n",
+       "      <td>-0.017099</td>\n",
+       "      <td>-0.018011</td>\n",
+       "      <td>0.041238</td>\n",
+       "      <td>0.005636</td>\n",
+       "      <td>-0.015556</td>\n",
+       "      <td>0.005061</td>\n",
+       "      <td>0.011217</td>\n",
+       "      <td>-0.005633</td>\n",
+       "      <td>-0.009141</td>\n",
+       "      <td>0.001630</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>4</td>\n",
+       "      <td>0.004179</td>\n",
+       "      <td>0.005348</td>\n",
+       "      <td>-0.043896</td>\n",
+       "      <td>0.009208</td>\n",
+       "      <td>0.022689</td>\n",
+       "      <td>0.011464</td>\n",
+       "      <td>-0.011334</td>\n",
+       "      <td>0.022437</td>\n",
+       "      <td>0.052387</td>\n",
+       "      <td>...</td>\n",
+       "      <td>-0.013803</td>\n",
+       "      <td>-0.010651</td>\n",
+       "      <td>-0.001198</td>\n",
+       "      <td>0.025812</td>\n",
+       "      <td>-0.038623</td>\n",
+       "      <td>0.010491</td>\n",
+       "      <td>-0.000509</td>\n",
+       "      <td>-0.011071</td>\n",
+       "      <td>-0.012894</td>\n",
+       "      <td>0.017563</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>5</td>\n",
+       "      <td>-0.058851</td>\n",
+       "      <td>-0.035628</td>\n",
+       "      <td>-0.014662</td>\n",
+       "      <td>-0.004050</td>\n",
+       "      <td>-0.007094</td>\n",
+       "      <td>0.001360</td>\n",
+       "      <td>-0.037586</td>\n",
+       "      <td>0.041380</td>\n",
+       "      <td>0.044340</td>\n",
+       "      <td>...</td>\n",
+       "      <td>-0.029362</td>\n",
+       "      <td>-0.005236</td>\n",
+       "      <td>-0.000825</td>\n",
+       "      <td>0.020010</td>\n",
+       "      <td>-0.042688</td>\n",
+       "      <td>0.021482</td>\n",
+       "      <td>0.041595</td>\n",
+       "      <td>0.004966</td>\n",
+       "      <td>-0.026901</td>\n",
+       "      <td>0.009236</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>5 rows × 65 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   item_id         0         1         2         3         4         5  \\\n",
+       "0        1 -0.033485 -0.046890 -0.031819  0.030568  0.009458 -0.049156   \n",
+       "1        2 -0.036122  0.002882 -0.031006 -0.012018  0.036453 -0.004707   \n",
+       "2        3 -0.024533  0.013923  0.000636  0.003143  0.053155  0.035068   \n",
+       "3        4  0.004179  0.005348 -0.043896  0.009208  0.022689  0.011464   \n",
+       "4        5 -0.058851 -0.035628 -0.014662 -0.004050 -0.007094  0.001360   \n",
+       "\n",
+       "          6         7         8  ...        54        55        56        57  \\\n",
+       "0 -0.005019  0.051071  0.081326  ... -0.014835 -0.013426 -0.012331  0.042307   \n",
+       "1  0.015386  0.042837  0.025634  ... -0.033783  0.000369  0.027628 -0.002053   \n",
+       "2  0.003644  0.022994  0.021832  ... -0.017099 -0.018011  0.041238  0.005636   \n",
+       "3 -0.011334  0.022437  0.052387  ... -0.013803 -0.010651 -0.001198  0.025812   \n",
+       "4 -0.037586  0.041380  0.044340  ... -0.029362 -0.005236 -0.000825  0.020010   \n",
+       "\n",
+       "         58        59        60        61        62        63  \n",
+       "0 -0.024871  0.000757  0.032160  0.014033 -0.041780  0.020292  \n",
+       "1 -0.028099 -0.015240 -0.012000  0.004758  0.006306  0.030888  \n",
+       "2 -0.015556  0.005061  0.011217 -0.005633 -0.009141  0.001630  \n",
+       "3 -0.038623  0.010491 -0.000509 -0.011071 -0.012894  0.017563  \n",
+       "4 -0.042688  0.021482  0.041595  0.004966 -0.026901  0.009236  \n",
+       "\n",
+       "[5 rows x 65 columns]"
+      ]
+     },
+     "execution_count": 51,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "item_embeddings.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "id": "66d7271e-0ea6-4568-ac5a-04089735f542",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# save to disk\n",
+    "item_embeddings.to_parquet(os.path.join(BASE_DIR, \"item_embeddings.parquet\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dadae279-913c-487b-ad55-4b4d6c110dc1",
+   "metadata": {},
+   "source": [
+    "### Create feature definitions "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1f70939f-8063-4422-b29b-6668acb1cfb7",
+   "metadata": {},
+   "source": [
+    "Now we will create our user and item features definitions in the user_features.py and item_features.py files and save these files in the feature_repo."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 53,
+   "id": "4ee27d67-e35a-42c5-8025-ed73f35c8e13",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "file = open(os.path.join(BASE_DIR, \"feature_repo/\", \"user_features.py\"), \"w\")\n",
+    "file.write(\n",
+    "    \"\"\"\n",
+    "from google.protobuf.duration_pb2 import Duration\n",
+    "import datetime\n",
+    "from feast import Entity, Feature, FeatureView, ValueType\n",
+    "from feast.infra.offline_stores.file_source import FileSource\n",
+    "\n",
+    "user_features = FileSource(\n",
+    "    path=\"{}\",\n",
+    "    event_timestamp_column=\"datetime\",\n",
+    "    created_timestamp_column=\"created\",\n",
+    ")\n",
+    "\n",
+    "user = Entity(name=\"user_id\", value_type=ValueType.INT32, description=\"user id\",)\n",
+    "user_raw = Entity(name=\"user_id_raw\", value_type=ValueType.INT32, description=\"raw user id\",)\n",
+    "\n",
+    "user_features_view = FeatureView(\n",
+    "    name=\"user_features\",\n",
+    "    entities=[\"user_id_raw\"],\n",
+    "    ttl=Duration(seconds=86400 * 7),\n",
+    "    features=[\n",
+    "        Feature(name=\"user_shops\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"user_profile\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"user_group\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"user_gender\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"user_age\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"user_consumption_2\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"user_is_occupied\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"user_geography\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"user_intentions\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"user_brands\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"user_categories\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"user_id_raw\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"item_id_seen\", dtype=ValueType.INT32_LIST),\n",
+    "    ],\n",
+    "    online=True,\n",
+    "    input=user_features,\n",
+    "    tags=dict(),\n",
+    ")\n",
+    "\"\"\".format(\n",
+    "        os.path.join(BASE_DIR, \"feature_repo/data/\", \"user_features.parquet\")\n",
+    "    )\n",
+    ")\n",
+    "file.close()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "id": "48a5927c-840d-410c-8f5b-bebce4f79640",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open(os.path.join(BASE_DIR, \"feature_repo/\", \"item_features.py\"), \"w\") as f:\n",
+    "    f.write(\n",
+    "        \"\"\"\n",
+    "from google.protobuf.duration_pb2 import Duration\n",
+    "import datetime\n",
+    "from feast import Entity, Feature, FeatureView, ValueType\n",
+    "from feast.infra.offline_stores.file_source import FileSource\n",
+    "\n",
+    "item_features = FileSource(\n",
+    "    path=\"{}\",\n",
+    "    event_timestamp_column=\"datetime\",\n",
+    "    created_timestamp_column=\"created\",\n",
+    ")\n",
+    "\n",
+    "item = Entity(name=\"item_id\", value_type=ValueType.INT32, description=\"item id\",)\n",
+    "item_raw = Entity(name=\"item_id_raw\", value_type=ValueType.INT32, description=\"raw item id\",)\n",
+    "\n",
+    "item_features_view = FeatureView(\n",
+    "    name=\"item_features\",\n",
+    "    entities=[\"item_id\"],\n",
+    "    ttl=Duration(seconds=86400 * 7),\n",
+    "    features=[\n",
+    "        Feature(name=\"item_category\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"item_shop\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"item_brand\", dtype=ValueType.INT32),\n",
+    "        Feature(name=\"item_id_raw\", dtype=ValueType.INT32),\n",
+    "    ],\n",
+    "    online=True,\n",
+    "    input=item_features,\n",
+    "    tags=dict(),\n",
+    ")\n",
+    "\"\"\".format(\n",
+    "            os.path.join(BASE_DIR, \"feature_repo/data/\", \"item_features.parquet\")\n",
+    "        )\n",
+    "    )\n",
+    "file.close()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "660333b2-4f99-49c7-8cd3-f0aad5dbd66f",
+   "metadata": {},
+   "source": [
+    "Let's checkout our Feast feature repository structure."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "id": "57133c1e-18d9-4ccb-9704-cdebd271985e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# # install seedir\n",
+    "# !pip install seedir"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 56,
+   "id": "986d53ea-c946-4046-a390-6d3b8801d280",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "feature_repo/\n",
+      "├─__init__.py\n",
+      "├─data/\n",
+      "│ ├─item_features.parquet\n",
+      "│ └─user_features.parquet\n",
+      "├─feature_store.yaml\n",
+      "├─item_features.py\n",
+      "└─user_features.py\n"
+     ]
+    }
+   ],
+   "source": [
+    "import seedir as sd\n",
+    "\n",
+    "feature_repo_path = os.path.join(BASE_DIR, \"feature_repo\")\n",
+    "sd.seedir(\n",
+    "    feature_repo_path,\n",
+    "    style=\"lines\",\n",
+    "    itemlimit=10,\n",
+    "    depthlimit=3,\n",
+    "    exclude_folders=\".ipynb_checkpoints\",\n",
+    "    sort=True,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "80678ea1-a7fb-4016-9e6f-c905497f4142",
+   "metadata": {},
+   "source": [
+    "### Next Steps\n",
+    "We trained and exported our ranking and retrieval models and NVTabular workflows. In the next step, we will learn how to deploy our trained models into [Triton Inference Server (TIS)](https://github.com/triton-inference-server/server) with Merlin Systems library.\n",
+    "\n",
+    "For the next step, move on to the `02-Deploying-multi-stage-Recsys-with-Merlin-Systems.ipynb` notebook to deploy our saved models as an ensemble to TIS and obtain prediction results for a given request."
+   ]
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "2758ff992bb32b90e83258e2e763c5fcee80c4002721441c6c0d17c649a641dd"
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  },
+  "merlin": {
+   "containers": [
+    "nvcr.io/nvidia/merlin/merlin-tensorflow-inference:latest"
+   ]
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/examples/Building-and-deploying-multi-stage-RecSys/02-Deploying-multi-stage-RecSys-with-Merlin-Systems_filtering.ipynb b/examples/Building-and-deploying-multi-stage-RecSys/02-Deploying-multi-stage-RecSys-with-Merlin-Systems_filtering.ipynb
new file mode 100644
index 000000000..455c8e25e
--- /dev/null
+++ b/examples/Building-and-deploying-multi-stage-RecSys/02-Deploying-multi-stage-RecSys-with-Merlin-Systems_filtering.ipynb
@@ -0,0 +1,754 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "8c3403a6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Copyright 2021 NVIDIA Corporation. All Rights Reserved.\n",
+    "#\n",
+    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+    "# you may not use this file except in compliance with the License.\n",
+    "# You may obtain a copy of the License at\n",
+    "#\n",
+    "#     http://www.apache.org/licenses/LICENSE-2.0\n",
+    "#\n",
+    "# Unless required by applicable law or agreed to in writing, software\n",
+    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+    "# See the License for the specific language governing permissions and\n",
+    "# limitations under the License.\n",
+    "# ================================"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03166488-1651-4025-84ed-4e9e5db34933",
+   "metadata": {},
+   "source": [
+    "<img src=\"http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png\" style=\"width: 90px; float: right;\">\n",
+    "\n",
+    "## Deploying a Multi-Stage RecSys into Production with Merlin Systems and Triton Inference Server\n",
+    "\n",
+    "This notebook is created using the latest stable [merlin-tensorflow-inference](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow-inference/tags) container. \n",
+    "\n",
+    "At this point, when you reach out to this notebook, we expect that you have already executed the first notebook `01-Building-Recommender-Systems-with-Merlin.ipynb` and exported all the required files and models. \n",
+    "\n",
+    "We are going to generate recommended items for a given user query (user_id) by following the steps described in the figure below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38d75184-cd24-4fe3-90f4-d76028626576",
+   "metadata": {},
+   "source": [
+    "![tritonensemble](../images/triton_ensemble.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da9dadb5-6eec-4a1b-99f9-929523f5cc07",
+   "metadata": {},
+   "source": [
+    "Merlin Systems library have the set of operators to be able to serve multi-stage recommender systems built with Tensorflow on [Triton Inference Server](https://github.com/triton-inference-server/server)(TIS) easily and efficiently. Below, we will go through these operators and demonstrate their usage in serving a multi-stage system on Triton."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "538677a3-acc6-48f6-acb6-d5bb5fe2e2d2",
+   "metadata": {},
+   "source": [
+    "### Import required libraries and functions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a27e18d7-b3e4-481c-b69e-23193b212c56",
+   "metadata": {},
+   "source": [
+    "At this step, we assume you already installed the tensorflow-gpu (or -cpu), feast and faiss-gpu (or -cpu) libraries when running the first notebook `01-Building-Recommender-Systems-with-Merlin.ipynb`. \n",
+    "\n",
+    "In case you need to install them for running this example on GPU, execute the following script in a cell.\n",
+    "```\n",
+    "%pip install tensorflow \"feast<0.20\" faiss-gpu\n",
+    "```\n",
+    "or the following script in a cell for CPU.\n",
+    "```\n",
+    "%pip install tensorflow-cpu \"feast<0.20\" faiss-cpu\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "4db1b5f1-c8fa-4e03-8744-1197873c5bee",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "08/01/2022 07:37:55 PM INFO:Loading faiss with AVX2 support.\n",
+      "08/01/2022 07:37:55 PM INFO:Could not load library with AVX2 support due to:\n",
+      "ModuleNotFoundError(\"No module named 'faiss.swigfaiss_avx2'\")\n",
+      "08/01/2022 07:37:55 PM INFO:Loading faiss.\n",
+      "08/01/2022 07:37:55 PM INFO:Successfully loaded faiss.\n",
+      "08/01/2022 07:37:56 PM INFO:init\n",
+      "/usr/local/lib/python3.8/dist-packages/cudf/utils/metadata/orc_column_statistics_pb2.py:19: DeprecationWarning: Call to deprecated create function FileDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
+      "  DESCRIPTOR = _descriptor.FileDescriptor(\n",
+      "/usr/local/lib/python3.8/dist-packages/cudf/utils/metadata/orc_column_statistics_pb2.py:37: DeprecationWarning: Call to deprecated create function FieldDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
+      "  _descriptor.FieldDescriptor(\n",
+      "/usr/local/lib/python3.8/dist-packages/cudf/utils/metadata/orc_column_statistics_pb2.py:30: DeprecationWarning: Call to deprecated create function Descriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
+      "  _INTEGERSTATISTICS = _descriptor.Descriptor(\n",
+      "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:19: DeprecationWarning: Call to deprecated create function FileDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
+      "  DESCRIPTOR = _descriptor.FileDescriptor(\n",
+      "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:33: DeprecationWarning: Call to deprecated create function EnumValueDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
+      "  _descriptor.EnumValueDescriptor(\n",
+      "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:27: DeprecationWarning: Call to deprecated create function EnumDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
+      "  _DATATYPE = _descriptor.EnumDescriptor(\n",
+      "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:335: DeprecationWarning: Call to deprecated create function FieldDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
+      "  _descriptor.FieldDescriptor(\n",
+      "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/model_config_pb2.py:328: DeprecationWarning: Call to deprecated create function Descriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
+      "  _MODELRATELIMITER_RESOURCE = _descriptor.Descriptor(\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import feast\n",
+    "import faiss\n",
+    "import seedir as sd\n",
+    "from nvtabular import ColumnSchema, Schema\n",
+    "\n",
+    "from merlin.systems.dag.ensemble import Ensemble\n",
+    "from merlin.systems.dag.ops.session_filter import FilterCandidates\n",
+    "from merlin.systems.dag.ops.softmax_sampling import SoftmaxSampling\n",
+    "from merlin.systems.dag.ops.tensorflow import PredictTensorflow\n",
+    "from merlin.systems.dag.ops.unroll_features import UnrollFeatures\n",
+    "from merlin.systems.triton.utils import run_triton_server, run_ensemble_on_tritonserver"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "55ead20e-c573-462e-9aa2-c3494bf0129f",
+   "metadata": {},
+   "source": [
+    "### Register our features on feature store"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2ac115e-4794-4a69-a962-8481f6e86df3",
+   "metadata": {},
+   "source": [
+    "The Feast feature registry is a central catalog of all the feature definitions and their related metadata(read more [here](https://docs.feast.dev/getting-started/architecture-and-components/registry)). We have defined our user and item features definitions in the `user_features.py` and  `item_features.py` files. With FeatureView() users can register data sources in their organizations into Feast, and then use those data sources for both training and online inference. In the `user_features.py` and `item_features.py` files, we are telling Feast where to find user and item features.\n",
+    "\n",
+    "Before we move on to the next steps, we need to perform `feast apply`command as directed below.  With that, we register our features, we can apply the changes to create our feature registry and store all entity and feature view definitions in a local SQLite online store called `online_store.db`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "66c02d67-df45-4869-8262-647cba77efcb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "BASE_DIR = os.environ.get(\"BASE_DIR\", \"/Merlin/examples/Building-and-deploying-multi-stage-RecSys/\")\n",
+    "\n",
+    "# define feature repo path\n",
+    "feast_repo_path = BASE_DIR + \"feature_repo/\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "e5fa545b-a979-4216-b176-ffd70d66e69d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "/Merlin/examples/Building-and-deploying-multi-stage-RecSys/feature_repo\n",
+      "/usr/local/lib/python3.8/dist-packages/feast/feature_view.py:100: DeprecationWarning: The argument 'input' is being deprecated. Please use 'batch_source' instead. Feast 0.13 and onwards will not support the argument 'input'.\n",
+      "  warnings.warn(\n",
+      "\u001b[1m\u001b[94mNo changes to registry\n",
+      "\u001b[1m\u001b[94mNo changes to infrastructure\n"
+     ]
+    }
+   ],
+   "source": [
+    "%cd $feast_repo_path\n",
+    "!feast apply"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c641fcd2-bd11-4569-80d4-2ae5e01a5cad",
+   "metadata": {},
+   "source": [
+    "### Loading features from offline store into an online store \n",
+    "\n",
+    "After we execute `apply` and registered our features and created our online local store, now we need to perform [materialization](https://docs.feast.dev/how-to-guides/running-feast-in-production) operation. This is done to keep our online store up to date and get it ready for prediction. For that we need to run a job that loads feature data from our feature view sources into our online store. As we add new features to our offline stores, we can continuously materialize them to keep our online store up to date by finding the latest feature values for each user. \n",
+    "\n",
+    "When you run the `feast materialize ..` command below, you will see a message <i>Materializing 2 feature views from 1995-01-01 01:01:01+00:00 to 2025-01-01 01:01:01+00:00 into the sqlite online store </i>  will be printed out.\n",
+    "\n",
+    "Note that materialization step takes some time.. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "52dacbbc-bdb6-4f7a-b202-3802050f0362",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Materializing \u001b[1m\u001b[32m2\u001b[0m feature views from \u001b[1m\u001b[32m1995-01-01 01:01:01+00:00\u001b[0m to \u001b[1m\u001b[32m2025-01-01 01:01:01+00:00\u001b[0m into the \u001b[1m\u001b[32msqlite\u001b[0m online store.\n",
+      "\n",
+      "\u001b[1m\u001b[32mitem_features\u001b[0m:\n",
+      "100%|███████████████████████████████████████████████████████████| 239/239 [00:00<00:00, 4328.71it/s]\n",
+      "\u001b[1m\u001b[32muser_features\u001b[0m:\n",
+      "100%|███████████████████████████████████████████████████████████| 255/255 [00:00<00:00, 1399.86it/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "!feast materialize 1995-01-01T01:01:01 2025-01-01T01:01:01"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fcc26e6-f6f3-4e44-bf3c-3b8e66dc9fd6",
+   "metadata": {},
+   "source": [
+    "Now, let's check our feature_repo structure again after we ran `apply` and `materialize` commands."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "9caba4e3-e6e0-4e2f-b51d-cd3456fd4a63",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "feature_repo/\n",
+      "├─__init__.py\n",
+      "├─data/\n",
+      "│ ├─item_features.parquet\n",
+      "│ ├─online_store.db\n",
+      "│ ├─registry.db\n",
+      "│ └─user_features.parquet\n",
+      "├─feature_store.yaml\n",
+      "├─item_features.py\n",
+      "└─user_features.py\n"
+     ]
+    }
+   ],
+   "source": [
+    "# set up the base dir to for feature store\n",
+    "feature_repo_path = os.path.join(BASE_DIR, 'feature_repo')\n",
+    "sd.seedir(feature_repo_path, style='lines', itemlimit=10, depthlimit=5, exclude_folders=['.ipynb_checkpoints', '__pycache__'], sort=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e768637c-0a4d-404b-8b58-7182fef0ab0e",
+   "metadata": {},
+   "source": [
+    "### Set up Faiss index, create feature store client and objects for the Triton ensemble"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "efada1e1-2556-4a26-b0ba-9cb96b3b151f",
+   "metadata": {},
+   "source": [
+    "Create a folder for faiss index path"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "96b7adc1-623b-41df-b1f9-dd4086a15bc9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if not os.path.isdir(os.path.join(BASE_DIR + 'faiss_index')):\n",
+    "    os.makedirs(os.path.join(BASE_DIR + 'faiss_index'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2aa037c0-7dad-427c-98bb-3da413e8fd14",
+   "metadata": {},
+   "source": [
+    "Define paths for ranking model, retrieval model, and faiss index path"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "23ba59b5-08c3-44b5-86f2-e63dec6893af",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "faiss_index_path = BASE_DIR + 'faiss_index' + \"/index.faiss\"\n",
+    "retrieval_model_path = BASE_DIR + \"query_tower/\"\n",
+    "ranking_model_path = BASE_DIR + \"dlrm/\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8b996019-bd2a-44e0-b004-4f412b300d63",
+   "metadata": {},
+   "source": [
+    "`QueryFaiss` operator creates an interface between a FAISS Approximate Nearest Neighbors (ANN) Index and Triton Infrence Server. For a given input query vector, we do an ANN search query to find the ids of top-k nearby nodes in the index.\n",
+    "\n",
+    "`setup_faiss` is  a utility function that will create a Faiss index from an embedding vector with using L2 distance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "0b6cc5bf-d07c-4963-a748-6e2b4827ee36",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING clustering 239 points to 32 centroids: please provide at least 1248 training points\n"
+     ]
+    }
+   ],
+   "source": [
+    "from merlin.systems.dag.ops.faiss import QueryFaiss, setup_faiss \n",
+    "\n",
+    "item_embeddings = np.ascontiguousarray(\n",
+    "    pd.read_parquet(BASE_DIR + \"item_embeddings.parquet\").to_numpy()\n",
+    ")\n",
+    "setup_faiss(item_embeddings, faiss_index_path)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46697177-512a-473e-8cca-9fe51d3daa03",
+   "metadata": {},
+   "source": [
+    "Create feature store client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "3bc00e04-c70c-4882-9952-66f4dbb97bdc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "feature_store = feast.FeatureStore(feast_repo_path)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c45df06-0cbe-4b52-ac1f-786e763895d7",
+   "metadata": {},
+   "source": [
+    "Fetch user features with `QueryFeast` operator from the feature store. `QueryFeast` operator is responsible for ensuring that our feast feature store can communicate correctly with tritonserver for the ensemble feast feature look ups."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "3decbe7b-03e3-4978-baac-03f6a0b078c9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py:15: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.\n",
+      "Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations\n",
+      "  ValueType.FLOAT: (np.float, False, False),\n"
+     ]
+    }
+   ],
+   "source": [
+    "from merlin.systems.dag.ops.feast import QueryFeast \n",
+    "\n",
+    "user_features = [\"user_id_raw\"] >> QueryFeast.from_feature_view(\n",
+    "    store=feature_store,\n",
+    "    view=\"user_features\",\n",
+    "    column=\"user_id_raw\",\n",
+    "    include_id=False,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27e25be7-3ff0-49c2-a3fc-03ec4d615e77",
+   "metadata": {},
+   "source": [
+    "Retrieve top-K candidate items using `retrieval model` that are relevant for a given user. We use `PredictTensorflow()` operator that takes a tensorflow model and packages it correctly for TIS to run with the tensorflow backend."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "21139caa-3a51-42e6-b006-21a92c95f1bc",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<function tensorflow.python.dlpack.dlpack.from_dlpack(dlcapsule)>"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# prevent TF to claim all GPU memory\n",
+    "from merlin.models.loader.tf_utils import configure_tensorflow\n",
+    "\n",
+    "configure_tensorflow()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "47c2d9b1-51dc-4549-977d-d7941ee6486c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2022-08-01 19:38:34.033476: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX\n",
+      "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
+      "2022-08-01 19:38:35.100917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 16249 MB memory:  -> device: 0, name: Quadro GV100, pci bus id: 0000:2d:00.0, compute capability: 7.0\n",
+      "08/01/2022 07:38:37 PM WARNING:No training configuration found in save file, so the model was *not* compiled. Compile it manually.\n"
+     ]
+    }
+   ],
+   "source": [
+    "topk_retrieval = 100\n",
+    "retrieval = (\n",
+    "    user_features\n",
+    "    >> PredictTensorflow(retrieval_model_path)\n",
+    "    >> QueryFaiss(faiss_index_path, topk=topk_retrieval)\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "ad1de211-23a2-4fa6-8c1c-61b626e0cb3b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Filter out anything that was in the user's current session\n",
+    "filtering = retrieval[\"candidate_ids\"] >> FilterCandidates(\n",
+    "    filter_out=user_features[\"item_id_seen\"]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ce4429c-1fe1-4304-bcdf-badebe3b5485",
+   "metadata": {},
+   "source": [
+    "Fetch item features for the candidate items that are retrieved from the retrieval step above from the feature store."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "b270f663-0ae1-4356-acd4-5f8c986abf4d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "item_features = filtering >> QueryFeast.from_feature_view(\n",
+    "    store=feature_store,\n",
+    "    view=\"item_features\",\n",
+    "    column=\"candidate_ids\",\n",
+    "    output_prefix=\"item\",\n",
+    "    include_id=True,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "304a4d09-db05-4666-b520-75dbbbc7ab17",
+   "metadata": {},
+   "source": [
+    "Merge the user features and items features to create the all set of combined features that were used in model training using `UnrollFeatures` operator which takes a target column and joins the \"unroll\" columns to the target. This helps when broadcasting a series of user features to a set of items."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "eb0ef434-03a5-4a36-afb9-e19a43243c64",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "user_features_to_unroll = [\n",
+    "    \"user_id\",\n",
+    "    \"user_shops\",\n",
+    "    \"user_profile\",\n",
+    "    \"user_group\",\n",
+    "    \"user_gender\",\n",
+    "    \"user_age\",\n",
+    "    \"user_consumption_2\",\n",
+    "    \"user_is_occupied\",\n",
+    "    \"user_geography\",\n",
+    "    \"user_intentions\",\n",
+    "    \"user_brands\",\n",
+    "    \"user_categories\",\n",
+    "]\n",
+    "\n",
+    "combined_features = item_features >> UnrollFeatures(\n",
+    "    \"item_id\", user_features[user_features_to_unroll]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fb0ce66-6b6c-43be-885e-a5435c3bbd9e",
+   "metadata": {},
+   "source": [
+    "Rank the combined features using the trained ranking model, which is a DLRM model for this example. We feed the path of the ranking model to `PredictTensorflow()` operator."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "ce31723e-af4d-4827-bb60-3a9fafcd9da6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ranking = combined_features >> PredictTensorflow(ranking_model_path)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f86fa47-de61-4007-ab55-9076e12ce963",
+   "metadata": {},
+   "source": [
+    "For the ordering we use `SoftmaxSampling()` operator. This operator sorts all inputs in descending order given the input ids and prediction introducing some randomization into the ordering by sampling items from the softmax of the predicted relevance scores, and finally returns top-k ordered items."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "7f65598b-e3e7-4238-a73e-19d00c3deb26",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "top_k=10\n",
+    "ordering = combined_features[\"item_id_raw\"] >> SoftmaxSampling(\n",
+    "    relevance_col=ranking[\"click/binary_classification_task\"], topk=top_k, temperature=20.0\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4e2e389-d884-44a1-8e32-4916a0eb43cf",
+   "metadata": {},
+   "source": [
+    "### Export Graph as Ensemble\n",
+    "The last step is to create the ensemble artifacts that TIS can consume. To make these artifacts import the Ensemble class. This class  represents an entire ensemble consisting of multiple models that run sequentially in TIS initiated by an inference request. It is responsible with interpreting the graph and exporting the correct files for TIS.\n",
+    "\n",
+    "When we create an Ensemble object we feed the graph and a schema representing the starting input of the graph.  After we create the ensemble object, we export the graph, supplying an export path for the `ensemble.export()` function. This returns an ensemble config which represents the entire inference pipeline and a list of node-specific configs."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50bc2e4f-5e58-4ad4-8ae5-d79ad286978f",
+   "metadata": {},
+   "source": [
+    "Create the folder to export the models and config files."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "4cf8420c-e4a2-4eaf-a3ca-c075bc908d1b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "request_schema = Schema(\n",
+    "    [\n",
+    "        ColumnSchema(\"user_id_raw\", dtype=np.int32),\n",
+    "    ]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "b28c452f-543c-45a4-9995-130ca6919669",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if not os.path.isdir(os.path.join(BASE_DIR + 'poc_ensemble')):\n",
+    "    os.makedirs(os.path.join(BASE_DIR + 'poc_ensemble'))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "6c64d686-aed5-42f8-b517-482b4237c69f",
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "ValueError",
+     "evalue": "Missing columns ['item_id_seen'] found in operatorSubsetColumns during compute_input_schema.",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
+      "Input \u001b[0;32mIn [21]\u001b[0m, in \u001b[0;36m<cell line: 4>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[38;5;66;03m# define the path where all the models and config files exported to\u001b[39;00m\n\u001b[1;32m      2\u001b[0m export_path \u001b[38;5;241m=\u001b[39m os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mjoin(BASE_DIR \u001b[38;5;241m+\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mpoc_ensemble\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m----> 4\u001b[0m ensemble \u001b[38;5;241m=\u001b[39m \u001b[43mEnsemble\u001b[49m\u001b[43m(\u001b[49m\u001b[43mordering\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrequest_schema\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m      5\u001b[0m ens_config, node_configs \u001b[38;5;241m=\u001b[39m ensemble\u001b[38;5;241m.\u001b[39mexport(export_path)\n",
+      "File \u001b[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ensemble.py:51\u001b[0m, in \u001b[0;36mEnsemble.__init__\u001b[0;34m(self, ops, schema, name, label_columns)\u001b[0m\n\u001b[1;32m     37\u001b[0m \u001b[38;5;124;03m\"\"\"_summary_\u001b[39;00m\n\u001b[1;32m     38\u001b[0m \n\u001b[1;32m     39\u001b[0m \u001b[38;5;124;03mParameters\u001b[39;00m\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     48\u001b[0m \u001b[38;5;124;03m    List of strings representing label columns, by default None\u001b[39;00m\n\u001b[1;32m     49\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m     50\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mgraph \u001b[38;5;241m=\u001b[39m Graph(ops)\n\u001b[0;32m---> 51\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgraph\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mconstruct_schema\u001b[49m\u001b[43m(\u001b[49m\u001b[43mschema\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     52\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mname \u001b[38;5;241m=\u001b[39m name\n\u001b[1;32m     53\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mlabel_columns \u001b[38;5;241m=\u001b[39m label_columns \u001b[38;5;129;01mor\u001b[39;00m []\n",
+      "File \u001b[0;32m/usr/local/lib/python3.8/dist-packages/merlin/dag/graph.py:73\u001b[0m, in \u001b[0;36mGraph.construct_schema\u001b[0;34m(self, root_schema, preserve_dtypes)\u001b[0m\n\u001b[1;32m     70\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mconstruct_schema\u001b[39m(\u001b[38;5;28mself\u001b[39m, root_schema: Schema, preserve_dtypes\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mGraph\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n\u001b[1;32m     71\u001b[0m     nodes \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mlist\u001b[39m(postorder_iter_nodes(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moutput_node))\n\u001b[0;32m---> 73\u001b[0m     \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_compute_node_schemas\u001b[49m\u001b[43m(\u001b[49m\u001b[43mroot_schema\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mnodes\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mpreserve_dtypes\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     74\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_validate_node_schemas(root_schema, nodes, preserve_dtypes)\n\u001b[1;32m     76\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.8/dist-packages/merlin/dag/graph.py:80\u001b[0m, in \u001b[0;36mGraph._compute_node_schemas\u001b[0;34m(self, root_schema, nodes, preserve_dtypes)\u001b[0m\n\u001b[1;32m     78\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_compute_node_schemas\u001b[39m(\u001b[38;5;28mself\u001b[39m, root_schema, nodes, preserve_dtypes\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m):\n\u001b[1;32m     79\u001b[0m     \u001b[38;5;28;01mfor\u001b[39;00m node \u001b[38;5;129;01min\u001b[39;00m nodes:\n\u001b[0;32m---> 80\u001b[0m         \u001b[43mnode\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcompute_schemas\u001b[49m\u001b[43m(\u001b[49m\u001b[43mroot_schema\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mpreserve_dtypes\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpreserve_dtypes\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.8/dist-packages/merlin/dag/node.py:179\u001b[0m, in \u001b[0;36mNode.compute_schemas\u001b[0;34m(self, root_schema, preserve_dtypes)\u001b[0m\n\u001b[1;32m    176\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mselector \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mparents[\u001b[38;5;241m0\u001b[39m]\u001b[38;5;241m.\u001b[39mselector \u001b[38;5;129;01mand\u001b[39;00m (\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mparents[\u001b[38;5;241m0\u001b[39m]\u001b[38;5;241m.\u001b[39mselector\u001b[38;5;241m.\u001b[39mnames):\n\u001b[1;32m    177\u001b[0m         \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mselector \u001b[38;5;241m=\u001b[39m parents_selector\n\u001b[0;32m--> 179\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39minput_schema \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mop\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcompute_input_schema\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    180\u001b[0m \u001b[43m    \u001b[49m\u001b[43mroot_schema\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mparents_schema\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdeps_schema\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mselector\u001b[49m\n\u001b[1;32m    181\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    183\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mselector \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mop\u001b[38;5;241m.\u001b[39mcompute_selector(\n\u001b[1;32m    184\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39minput_schema, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mselector, parents_selector, dependencies_selector\n\u001b[1;32m    185\u001b[0m )\n\u001b[1;32m    187\u001b[0m prev_output_schema \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moutput_schema \u001b[38;5;28;01mif\u001b[39;00m preserve_dtypes \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.8/dist-packages/merlin/dag/base_operator.py:79\u001b[0m, in \u001b[0;36mBaseOperator.compute_input_schema\u001b[0;34m(self, root_schema, parents_schema, deps_schema, selector)\u001b[0m\n\u001b[1;32m     55\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mcompute_input_schema\u001b[39m(\n\u001b[1;32m     56\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m     57\u001b[0m     root_schema: Schema,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     60\u001b[0m     selector: ColumnSelector,\n\u001b[1;32m     61\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Schema:\n\u001b[1;32m     62\u001b[0m     \u001b[38;5;124;03m\"\"\"Given the schemas coming from upstream sources and a column selector for the\u001b[39;00m\n\u001b[1;32m     63\u001b[0m \u001b[38;5;124;03m    input columns, returns a set of schemas for the input columns this operator will use\u001b[39;00m\n\u001b[1;32m     64\u001b[0m \u001b[38;5;124;03m    Parameters\u001b[39;00m\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     77\u001b[0m \u001b[38;5;124;03m        The schemas of the columns used by this operator\u001b[39;00m\n\u001b[1;32m     78\u001b[0m \u001b[38;5;124;03m    \"\"\"\u001b[39;00m\n\u001b[0;32m---> 79\u001b[0m     \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_validate_matching_cols\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m     80\u001b[0m \u001b[43m        \u001b[49m\u001b[43mparents_schema\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[43mdeps_schema\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mselector\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcompute_input_schema\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;18;43m__name__\u001b[39;49m\n\u001b[1;32m     81\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     83\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m parents_schema \u001b[38;5;241m+\u001b[39m deps_schema\n",
+      "File \u001b[0;32m/usr/local/lib/python3.8/dist-packages/merlin/dag/base_operator.py:199\u001b[0m, in \u001b[0;36mBaseOperator._validate_matching_cols\u001b[0;34m(self, schema, selector, method_name)\u001b[0m\n\u001b[1;32m    197\u001b[0m missing_cols \u001b[38;5;241m=\u001b[39m [name \u001b[38;5;28;01mfor\u001b[39;00m name \u001b[38;5;129;01min\u001b[39;00m selector\u001b[38;5;241m.\u001b[39mnames \u001b[38;5;28;01mif\u001b[39;00m name \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m schema\u001b[38;5;241m.\u001b[39mcolumn_names]\n\u001b[1;32m    198\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m missing_cols:\n\u001b[0;32m--> 199\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m    200\u001b[0m         \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mMissing columns \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mmissing_cols\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m found in operator\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m    201\u001b[0m         \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__class__\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m during \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mmethod_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m    202\u001b[0m     )\n",
+      "\u001b[0;31mValueError\u001b[0m: Missing columns ['item_id_seen'] found in operatorSubsetColumns during compute_input_schema."
+     ]
+    }
+   ],
+   "source": [
+    "# define the path where all the models and config files exported to\n",
+    "export_path = os.path.join(BASE_DIR + 'poc_ensemble')\n",
+    "\n",
+    "ensemble = Ensemble(ordering, request_schema)\n",
+    "ens_config, node_configs = ensemble.export(export_path)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "276eedd8-5dc0-4ad0-8725-c8da60fea693",
+   "metadata": {},
+   "source": [
+    "Let's check our export_path structure"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe7962cc-f26d-4a4a-b5a3-d214e0f37456",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "### Starting Triton Server"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c07c620-7d6c-4275-87fe-e5b94335bdb9",
+   "metadata": {},
+   "source": [
+    "It is time to deploy all the models as an ensemble model to Triton Inference Serve [TIS](https://github.com/triton-inference-server). After we export the ensemble, we are ready to start the TIS. You can start triton server by using the following command on your terminal:\n",
+    "\n",
+    "```\n",
+    "tritonserver --model-repository=/ensemble_export_path/ --backend-config=tensorflow,version=2\n",
+    "```\n",
+    "\n",
+    "For the `--model-repository` argument, specify the same path as the `export_path` that you specified previously in the `ensemble.export` method. This command will launch the server and load all the models to the server. Once all the models are loaded successfully, you should see `READY` status printed out in the terminal for each loaded model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c0a798f-6abf-4cbb-87f8-f60a6e757092",
+   "metadata": {},
+   "source": [
+    "### Retrieving Recommendations from Triton"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e3fe264-e4a4-4dab-9b04-f83fb696d7d1",
+   "metadata": {},
+   "source": [
+    "Once our models are successfully loaded to the TIS, we can now easily send a request to TIS and get a response for our query with `send_triton_request` utility function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "e95f1d85-9cbc-423b-9de1-91d1e421e5e4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from merlin.systems.triton.utils import send_triton_request\n",
+    "from merlin.core.dispatch import make_df\n",
+    "\n",
+    "# create a request to be sent to TIS\n",
+    "request = make_df({\"user_id\": [1]})\n",
+    "request[\"user_id\"] = request[\"user_id\"].astype(np.int32)\n",
+    "\n",
+    "outputs = ensemble.graph.output_schema.column_names"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "74ec62f2-5935-45c6-8058-e1cdade6f80f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = send_triton_request(request, outputs)\n",
+    "response"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4605dbe-5f97-4b31-8ee4-ce7c1cb69d97",
+   "metadata": {},
+   "source": [
+    "Note that these item ids are encoded values, not the raw original values. We will eventually create the reverse dictionary lookup functionality to be able to map these encoded item ids to their original raw ids with one-line of code. But if you really want to do it now, you can easily map these ids to their original values using the `unique.item_id.parquet` file stored in the `categories` folder.\n",
+    "\n",
+    "That's it! You finished deploying a multi-stage Recommender Systems on Triton Inference Server using Merlin framework."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  },
+  "merlin": {
+   "containers": [
+    "nvcr.io/nvidia/merlin/merlin-tensorflow-inference:latest"
+   ]
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

From 75b4ee4b4ab80a5a032a3eaaf88b71ea93ee21b4 Mon Sep 17 00:00:00 2001
From: root <ronayak@hotmail.com>
Date: Mon, 1 Aug 2022 20:22:26 +0000
Subject: [PATCH 2/2] fix unrolled feats

---
 ...RecSys-with-Merlin-Systems_filtering.ipynb | 52 +++++++------------
 1 file changed, 18 insertions(+), 34 deletions(-)

diff --git a/examples/Building-and-deploying-multi-stage-RecSys/02-Deploying-multi-stage-RecSys-with-Merlin-Systems_filtering.ipynb b/examples/Building-and-deploying-multi-stage-RecSys/02-Deploying-multi-stage-RecSys-with-Merlin-Systems_filtering.ipynb
index 455c8e25e..5f937da0a 100644
--- a/examples/Building-and-deploying-multi-stage-RecSys/02-Deploying-multi-stage-RecSys-with-Merlin-Systems_filtering.ipynb
+++ b/examples/Building-and-deploying-multi-stage-RecSys/02-Deploying-multi-stage-RecSys-with-Merlin-Systems_filtering.ipynb
@@ -90,12 +90,12 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "08/01/2022 07:37:55 PM INFO:Loading faiss with AVX2 support.\n",
-      "08/01/2022 07:37:55 PM INFO:Could not load library with AVX2 support due to:\n",
+      "08/01/2022 08:15:13 PM INFO:Loading faiss with AVX2 support.\n",
+      "08/01/2022 08:15:13 PM INFO:Could not load library with AVX2 support due to:\n",
       "ModuleNotFoundError(\"No module named 'faiss.swigfaiss_avx2'\")\n",
-      "08/01/2022 07:37:55 PM INFO:Loading faiss.\n",
-      "08/01/2022 07:37:55 PM INFO:Successfully loaded faiss.\n",
-      "08/01/2022 07:37:56 PM INFO:init\n",
+      "08/01/2022 08:15:13 PM INFO:Loading faiss.\n",
+      "08/01/2022 08:15:13 PM INFO:Successfully loaded faiss.\n",
+      "08/01/2022 08:15:14 PM INFO:init\n",
       "/usr/local/lib/python3.8/dist-packages/cudf/utils/metadata/orc_column_statistics_pb2.py:19: DeprecationWarning: Call to deprecated create function FileDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
       "  DESCRIPTOR = _descriptor.FileDescriptor(\n",
       "/usr/local/lib/python3.8/dist-packages/cudf/utils/metadata/orc_column_statistics_pb2.py:37: DeprecationWarning: Call to deprecated create function FieldDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
@@ -165,7 +165,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 5,
    "id": "e5fa545b-a979-4216-b176-ffd70d66e69d",
    "metadata": {},
    "outputs": [
@@ -202,7 +202,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 6,
    "id": "52dacbbc-bdb6-4f7a-b202-3802050f0362",
    "metadata": {},
    "outputs": [
@@ -213,9 +213,9 @@
       "Materializing \u001b[1m\u001b[32m2\u001b[0m feature views from \u001b[1m\u001b[32m1995-01-01 01:01:01+00:00\u001b[0m to \u001b[1m\u001b[32m2025-01-01 01:01:01+00:00\u001b[0m into the \u001b[1m\u001b[32msqlite\u001b[0m online store.\n",
       "\n",
       "\u001b[1m\u001b[32mitem_features\u001b[0m:\n",
-      "100%|███████████████████████████████████████████████████████████| 239/239 [00:00<00:00, 4328.71it/s]\n",
+      "100%|███████████████████████████████████████████████████████████| 239/239 [00:00<00:00, 4324.36it/s]\n",
       "\u001b[1m\u001b[32muser_features\u001b[0m:\n",
-      "100%|███████████████████████████████████████████████████████████| 255/255 [00:00<00:00, 1399.86it/s]\n"
+      "100%|███████████████████████████████████████████████████████████| 255/255 [00:00<00:00, 1404.11it/s]\n"
      ]
     }
    ],
@@ -233,31 +233,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 7,
    "id": "9caba4e3-e6e0-4e2f-b51d-cd3456fd4a63",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "feature_repo/\n",
-      "├─__init__.py\n",
-      "├─data/\n",
-      "│ ├─item_features.parquet\n",
-      "│ ├─online_store.db\n",
-      "│ ├─registry.db\n",
-      "│ └─user_features.parquet\n",
-      "├─feature_store.yaml\n",
-      "├─item_features.py\n",
-      "└─user_features.py\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
-    "# set up the base dir to for feature store\n",
-    "feature_repo_path = os.path.join(BASE_DIR, 'feature_repo')\n",
-    "sd.seedir(feature_repo_path, style='lines', itemlimit=10, depthlimit=5, exclude_folders=['.ipynb_checkpoints', '__pycache__'], sort=True)"
+    "# # set up the base dir to for feature store\n",
+    "# feature_repo_path = os.path.join(BASE_DIR, 'feature_repo')\n",
+    "# sd.seedir(feature_repo_path, style='lines', itemlimit=10, depthlimit=5, exclude_folders=['.ipynb_checkpoints', '__pycache__'], sort=True)"
    ]
   },
   {
@@ -435,10 +418,10 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "2022-08-01 19:38:34.033476: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX\n",
+      "2022-08-01 20:01:10.368513: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX\n",
       "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
-      "2022-08-01 19:38:35.100917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 16249 MB memory:  -> device: 0, name: Quadro GV100, pci bus id: 0000:2d:00.0, compute capability: 7.0\n",
-      "08/01/2022 07:38:37 PM WARNING:No training configuration found in save file, so the model was *not* compiled. Compile it manually.\n"
+      "2022-08-01 20:01:11.486337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 16249 MB memory:  -> device: 0, name: Quadro GV100, pci bus id: 0000:2d:00.0, compute capability: 7.0\n",
+      "08/01/2022 08:01:13 PM WARNING:No training configuration found in save file, so the model was *not* compiled. Compile it manually.\n"
      ]
     }
    ],
@@ -516,6 +499,7 @@
     "    \"user_intentions\",\n",
     "    \"user_brands\",\n",
     "    \"user_categories\",\n",
+    "    \"item_id_seen\"\n",
     "]\n",
     "\n",
     "combined_features = item_features >> UnrollFeatures(\n",

	name	tags	dtype	is_list	is_ragged	properties.num_buckets	properties.freq_threshold	properties.max_size	properties.start_index	properties.cat_path	properties.embedding_sizes.cardinality	properties.embedding_sizes.dimension	properties.domain.min	properties.domain.max
0	item_id_seen	(Tags.USER)	int32	True	True	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	user_id	(Tags.USER, Tags.CATEGORICAL, Tags.USER_ID)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_id.parquet	294736.0	512.0	0.0	294736.0
2	user_shops	(Tags.USER, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_shops.parquet	116741.0	512.0	0.0	116741.0
3	user_profile	(Tags.USER, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_profile.parquet	98.0	21.0	0.0	98.0
4	user_group	(Tags.USER, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_group.parquet	14.0	16.0	0.0	14.0
5	user_gender	(Tags.USER, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_gender.parquet	3.0	16.0	0.0	3.0
6	user_age	(Tags.USER, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_age.parquet	8.0	16.0	0.0	8.0
7	user_consumption_1	(Tags.USER, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_consumption_1.parquet	4.0	16.0	0.0	4.0
8	user_consumption_2	(Tags.USER, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_consumption_2.parquet	4.0	16.0	0.0	4.0
9	user_is_occupied	(Tags.USER, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_is_occupied.parquet	3.0	16.0	0.0	3.0
10	user_geography	(Tags.USER, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_geography.parquet	5.0	16.0	0.0	5.0
11	user_intentions	(Tags.USER, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_intentions.parquet	33786.0	512.0	0.0	33786.0
12	user_brands	(Tags.USER, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_brands.parquet	58015.0	512.0	0.0	58015.0
13	user_categories	(Tags.USER, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_categories.parquet	6086.0	211.0	0.0	6086.0
14	item_id	(Tags.CATEGORICAL, Tags.ITEM, Tags.ITEM_ID)	int32	False	False	NaN	0.0	0.0	0.0	./categories_processed/categories/unique.item_...	240.0	34.0	0.0	240.0
15	item_category	(Tags.ITEM, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.item_category.parquet	8581.0	255.0	0.0	8581.0
16	item_shop	(Tags.ITEM, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.item_shop.parquet	604498.0	512.0	0.0	604498.0
17	item_brand	(Tags.ITEM, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.item_brand.parquet	208179.0	512.0	0.0	208179.0
18	item_intention	(Tags.ITEM, Tags.CATEGORICAL)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.item_intention.parquet	96258.0	512.0	0.0	96258.0
19	user_item_categories	(Tags.CATEGORICAL, user_item)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_item_categories.parquet	7735.0	241.0	0.0	7735.0
20	user_item_shops	(Tags.CATEGORICAL, user_item)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_item_shops.parquet	384343.0	512.0	0.0	384343.0
21	user_item_brands	(Tags.CATEGORICAL, user_item)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_item_brands.parquet	142632.0	512.0	0.0	142632.0
22	user_item_intentions	(Tags.CATEGORICAL, user_item)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_item_intentions.parquet	74317.0	512.0	0.0	74317.0
23	position	(Tags.CATEGORICAL, Tags.CONTEXT)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.position.parquet	4.0	16.0	0.0	4.0
24	click	()	int64	False	False	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0	2.0
25	conversion	()	int64	False	False	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0	2.0
26	item_id_raw	(Tags.CATEGORICAL, Tags.ITEM, Tags.ITEM_ID)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.item_id.parquet	3078306.0	512.0	0.0	3078306.0
27	user_id_raw	(Tags.USER, Tags.CATEGORICAL, Tags.USER_ID)	int32	False	False	NaN	0.0	0.0	0.0	.//categories/unique.user_id.parquet	294736.0	512.0	0.0	294736.0
	user_id	user_shops	user_profile	user_group	user_gender	user_age	user_consumption_2	user_is_occupied	user_geography	user_intentions	user_brands	user_categories	item_id_seen	user_id_raw
0	1	1	1	1	1	1	1	1	1	1	1	1	[7, 84, 21, 17, 68, 51, 29, 28, 3, 18, 9, 3, 1...	7
1	2	2	1	1	1	1	1	1	1	2	2	2	[5, 60, 10, 8, 55, 88, 13, 23, 28, 1, 8, 46, 6...	10
2	3	3	1	1	1	1	1	1	1	3	3	3	[36, 4, 3, 18, 31, 36, 5, 61, 4, 6, 31, 16, 26...	8
3	4	4	1	1	1	1	1	1	1	4	4	4	[175, 7, 95, 71, 12, 6, 52, 7, 2, 34, 14, 9, 1...	9
4	5	5	1	1	1	1	1	1	1	5	5	5	[7, 16, 32, 28, 7, 3, 37, 3, 133, 47, 7, 9, 23...	6
	user_id	user_shops	user_profile	user_group	user_gender	user_age	user_consumption_2	user_is_occupied	user_geography	user_intentions	user_brands	user_categories	item_id_seen	user_id_raw	datetime	created
0	1	1	1	1	1	1	1	1	1	1	1	1	[7, 84, 21, 17, 68, 51, 29, 28, 3, 18, 9, 3, 1...	7	2022-08-01 18:48:08.630208	2022-08-01 18:48:08.631751
1	2	2	1	1	1	1	1	1	1	2	2	2	[5, 60, 10, 8, 55, 88, 13, 23, 28, 1, 8, 46, 6...	10	2022-08-01 18:48:08.630208	2022-08-01 18:48:08.631751
2	3	3	1	1	1	1	1	1	1	3	3	3	[36, 4, 3, 18, 31, 36, 5, 61, 4, 6, 31, 16, 26...	8	2022-08-01 18:48:08.630208	2022-08-01 18:48:08.631751
3	4	4	1	1	1	1	1	1	1	4	4	4	[175, 7, 95, 71, 12, 6, 52, 7, 2, 34, 14, 9, 1...	9	2022-08-01 18:48:08.630208	2022-08-01 18:48:08.631751
4	5	5	1	1	1	1	1	1	1	5	5	5	[7, 16, 32, 28, 7, 3, 37, 3, 133, 47, 7, 9, 23...	6	2022-08-01 18:48:08.630208	2022-08-01 18:48:08.631751
	item_category	item_shop	item_brand	item_id	item_id_raw	datetime	created
0	1	1	1	1	7	2022-08-01 18:49:30.289331	2022-08-01 18:49:30.292435
1	2	2	2	2	8	2022-08-01 18:49:30.289331	2022-08-01 18:49:30.292435
2	3	3	3	3	6	2022-08-01 18:49:30.289331	2022-08-01 18:49:30.292435
3	4	4	4	4	9	2022-08-01 18:49:30.289331	2022-08-01 18:49:30.292435
4	5	5	5	5	10	2022-08-01 18:49:30.289331	2022-08-01 18:49:30.292435
	item_id	0	1	2	3	4	5	6	7	8	...	54	55	56	57	58	59	60	61	62	63
0	1	-0.033485	-0.046890	-0.031819	0.030568	0.009458	-0.049156	-0.005019	0.051071	0.081326	...	-0.014835	-0.013426	-0.012331	0.042307	-0.024871	0.000757	0.032160	0.014033	-0.041780	0.020292
1	2	-0.036122	0.002882	-0.031006	-0.012018	0.036453	-0.004707	0.015386	0.042837	0.025634	...	-0.033783	0.000369	0.027628	-0.002053	-0.028099	-0.015240	-0.012000	0.004758	0.006306	0.030888
2	3	-0.024533	0.013923	0.000636	0.003143	0.053155	0.035068	0.003644	0.022994	0.021832	...	-0.017099	-0.018011	0.041238	0.005636	-0.015556	0.005061	0.011217	-0.005633	-0.009141	0.001630
3	4	0.004179	0.005348	-0.043896	0.009208	0.022689	0.011464	-0.011334	0.022437	0.052387	...	-0.013803	-0.010651	-0.001198	0.025812	-0.038623	0.010491	-0.000509	-0.011071	-0.012894	0.017563
4	5	-0.058851	-0.035628	-0.014662	-0.004050	-0.007094	0.001360	-0.037586	0.041380	0.044340	...	-0.029362	-0.005236	-0.000825	0.020010	-0.042688	0.021482	0.041595	0.004966	-0.026901	0.009236