diff --git a/examples/getting_started/olive-awq-ft-llama.ipynb b/examples/getting_started/olive-awq-ft-llama.ipynb
new file mode 100644
index 000000000..29afe3ea1
--- /dev/null
+++ b/examples/getting_started/olive-awq-ft-llama.ipynb
@@ -0,0 +1,290 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "tv6vx7wooDfk"
+   },
+   "source": [
+    "# ✨ Quantize & Finetune an SLM with Olive\n",
+    "\n",
+    "> ⚠️ **This notebook will quantize an Small Language Model (SLM) using the AWQ algorithm, which requires an Nvidia A10 or A100 GPU device.**\n",
+    "\n",
+    "In this notebook, you will:\n",
+    "\n",
+    "1. Quantize Llama-3.2-1B-Instruct model using the [AWQ Algorithm](https://ar5iv.labs.arxiv.org/html/2306.00978).\n",
+    "1. Fine-tune the quantized model to classify English phrases into Surprise/Joy/Fear/Sadness.\n",
+    "1. Optimize the fine-tuned model for the ONNX Runtime.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 🐍 Install Python dependencies\n",
+    "\n",
+    "The following cells create a pip requirements file and then install the libraries."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%writefile requirements.txt\n",
+    "\n",
+    "olive-ai==0.7.1\n",
+    "transformers==4.44.2\n",
+    "autoawq==0.2.6\n",
+    "optimum==1.23.1\n",
+    "peft==0.13.2\n",
+    "accelerate>=0.30.0\n",
+    "scipy==1.14.1\n",
+    "onnxruntime-genai==0.5.0\n",
+    "torchvision==0.18.1\n",
+    "tabulate==0.9.0"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "ZtY3VYxCoDfm"
+   },
+   "outputs": [],
+   "source": [
+    "%%capture\n",
+    "\n",
+    "%pip install -r requirements.txt"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 🤗 Login to Hugging Face\n",
+    "\n",
+    "In this notebook you'll be finetuning [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct), which is *gated* on Hugging Face and therefore you will need to request access to the model. Once you have access to the model, you'll need to log-in to Hugging Face with a [user access token](https://huggingface.co/docs/hub/security-tokens) so that Olive can download it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!huggingface-cli login --token USER_ACCESS_TOKEN"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 🗜️ Quantize the model using AWQ\n",
+    "First, you'll quantize the [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) model using the [AWQ Algorithm](https://ar5iv.labs.arxiv.org/html/2306.00978). Olive also supports other quantization algorithms, such as GPTQ, HQQ, and RTN.\n",
+    "\n",
+    "You can choose a different model to quantize from Hugging-Face, just update the `--model_name_or_path` argument.\n",
+    "> ⏳ **It takes approximately ~6mins to complete the AWQ quantization**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!olive quantize \\\n",
+    "    --model_name_or_path \"meta-llama/Llama-3.2-1B-Instruct\" \\\n",
+    "    --trust_remote_code \\\n",
+    "    --algorithm awq \\\n",
+    "    --output_path models/llama/awq \\\n",
+    "    --log_level 1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "nxJCT5wioDfp"
+   },
+   "source": [
+    "## 🏃 Train the model\n",
+    "\n",
+    "Fine-tuning language models helps when we desire very specific outputs. In this example, you'll fine-tune the **AWQ quantized model variant** of Llama-3.2-1B-instruct from the previous cell to respond to an English phrase with a single word answer that classifies the phrases into one of surprise/fear/joy/sadness categories. Here is a sample of the data used for fine-tuning:\n",
+    "\n",
+    "```jsonl\n",
+    "{\"phrase\": \"The sudden thunderstorm caught me off guard.\", \"tone\": \"surprise\"}\n",
+    "{\"phrase\": \"The creaking door at night is quite spooky.\", \"tone\": \"fear\"}\n",
+    "{\"phrase\": \"Celebrating my birthday with friends is always fun.\", \"tone\": \"joy\"}\n",
+    "{\"phrase\": \"Saying goodbye to my pet was heart-wrenching.\", \"tone\": \"sadness\"}\n",
+    "```\n",
+    "\n",
+    "Fine-tuning *after* quantization provides an opportunity to recover some of the loss from the quantization process and enhance the model quality. For more details on quantization and finetuning, read [Is it better to quantize before or after finetuning?](https://onnxruntime.ai/blogs/olive-quant-ft).\n",
+    "\n",
+    "In the following `olive finetune` command the `--data_name` argument is a Hugging Face dataset [xxyyzzz/phrase_classification](https://huggingface.co/datasets/xxyyzzz/phrase_classification). You can also provide your own data from local disk using the `--data_files` argument.\n",
+    "\n",
+    "> ⏳ **It takes ~6mins to complete the Finetuning**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "8t36pRF2oDfq"
+   },
+   "outputs": [],
+   "source": [
+    "!olive finetune \\\n",
+    "    --method lora \\\n",
+    "    --model_name_or_path models/llama/awq \\\n",
+    "    --trust_remote_code \\\n",
+    "    --data_name xxyyzzz/phrase_classification \\\n",
+    "    --text_template \"<|start_header_id|>user<|end_header_id|>\\n{phrase}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n{tone}<|eot_id|>\" \\\n",
+    "    --max_steps 300 \\\n",
+    "    --output_path models/llama/ft \\\n",
+    "    --log_level 1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "7woNXLDF0bhh"
+   },
+   "source": [
+    "## 🪄 Automatic model optimization with Olive\n",
+    "\n",
+    "Next, you'll execute Olive's automatic optimizer using the `auto-opt` CLI command, which will:\n",
+    "\n",
+    "1. Capture the fine-tuned model into an ONNX graph and convert the weights into the ONNX format.\n",
+    "1. Optimize the ONNX graph (e.g. fuse nodes, reshape, etc).\n",
+    "1. Extract the fine-tuned LoRA weights and place them into a separate file.\n",
+    "\n",
+    "> ⏳**It takes ~2mins for the automatic optimization to complete**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "M-prKBy20U5m"
+   },
+   "outputs": [],
+   "source": [
+    "!olive auto-opt \\\n",
+    "    --model_name_or_path models/llama/ft/model \\\n",
+    "    --adapter_path models/llama/ft/adapter \\\n",
+    "    --device cpu \\\n",
+    "    --provider CPUExecutionProvider \\\n",
+    "    --use_ort_genai \\\n",
+    "    --output_path models/llama/onnx-ao \\\n",
+    "    --log_level 1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "8Uwm432loDfr"
+   },
+   "source": [
+    "## 🧠 Inference\n",
+    "\n",
+    "The code below creates a test app that consumes the model in a simple console chat interface. You will be prompted to enter an English phrase (for example: \"Cricket is a wonderful game\") and the app will output a chat completion using:\n",
+    "\n",
+    "1. The base model only (no adapter). You should notice that the model gives a verbose response.\n",
+    "1. The base model **plus adapter**. You should notice that we get one word classification. \n",
+    "\n",
+    "In the code, you'll  notice that ONNX Runtime allows you to hot-swap adapters for different tasks, which is often referred to as *multi-LoRA* serving.\n",
+    "\n",
+    "Whilst the inference code uses the Python API for the ONNX Runtime, other language bindings are available in [Java, C#, C++](https://github.com/microsoft/onnxruntime-genai/tree/main/examples).\n",
+    "\n",
+    "To exit the chat interface, enter `exit` or select `Ctrl+c`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "puMdoAxjoDfr"
+   },
+   "outputs": [],
+   "source": [
+    "import onnxruntime_genai as og\n",
+    "\n",
+    "model_path = \"models/llama/onnx-ao/model\"\n",
+    "\n",
+    "model = og.Model(f'{model_path}')\n",
+    "adapters = og.Adapters(model)\n",
+    "adapters.load(f'{model_path}/adapter_weights.onnx_adapter', \"classifier\")\n",
+    "tokenizer = og.Tokenizer(model)\n",
+    "tokenizer_stream = tokenizer.create_stream()\n",
+    "\n",
+    "# Keep asking for input prompts in a loop\n",
+    "while True:\n",
+    "    phrase = input(\"Phrase: \")\n",
+    "    prompt = f\"<|start_header_id|>user<|end_header_id|>\\n{phrase}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\"\n",
+    "    input_tokens = tokenizer.encode(prompt)\n",
+    "    \n",
+    "    # first run without the adapter\n",
+    "    params = og.GeneratorParams(model)\n",
+    "    params.set_search_options(past_present_share_buffer=False)\n",
+    "    params.input_ids = input_tokens\n",
+    "    generator = og.Generator(model, params)\n",
+    "\n",
+    "    print()\n",
+    "    print(\"Output from Base Model (notice verbosity): \", end='', flush=True)\n",
+    "\n",
+    "    while not generator.is_done():\n",
+    "            generator.compute_logits()\n",
+    "            generator.generate_next_token()\n",
+    "\n",
+    "            new_token = generator.get_next_tokens()[0]\n",
+    "            print(tokenizer_stream.decode(new_token), end='', flush=True)\n",
+    "    print()\n",
+    "    print()\n",
+    "    \n",
+    "    # Delete the generator to free the captured graph for the next generator, if graph capture is enabled\n",
+    "    del generator\n",
+    "    \n",
+    "     # now run with adapter\n",
+    "    generator = og.Generator(model, params)\n",
+    "    # set the adapter to active for this response\n",
+    "    generator.set_active_adapter(adapters, \"classifier\")\n",
+    "\n",
+    "    print()\n",
+    "    print(\"Output from Base Model + Adapter (notice single word response): \", end='', flush=True)\n",
+    "\n",
+    "    while not generator.is_done():\n",
+    "            generator.compute_logits()\n",
+    "            generator.generate_next_token()\n",
+    "\n",
+    "            new_token = generator.get_next_tokens()[0]\n",
+    "            print(tokenizer_stream.decode(new_token), end='', flush=True)\n",
+    "    print()\n",
+    "    print()\n",
+    "    del generator"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "gpuType": "A100",
+   "provenance": [],
+   "toc_visible": true
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}