diff --git a/doc/source/examples/notebooks.rst b/doc/source/examples/notebooks.rst
index 77696ab96a..d52c4c1771 100644
--- a/doc/source/examples/notebooks.rst
+++ b/doc/source/examples/notebooks.rst
@@ -37,6 +37,7 @@ Python Language Wrapper Examples
    TFserving MNIST <tfserving_mnist>
    Statsmodels Holt-Winter's time-series model <statsmodels>
    Runtime Metrics & Tags <runtime_metrics_tags>
+   Triton GPT2 Example <triton_gpt2_example>
 
 Specialised Framework Examples
 ------------------------------
diff --git a/doc/source/examples/triton_gpt2_example.nblink b/doc/source/examples/triton_gpt2_example.nblink
new file mode 100644
index 0000000000..6d83f6551b
--- /dev/null
+++ b/doc/source/examples/triton_gpt2_example.nblink
@@ -0,0 +1,3 @@
+{
+  "path": "../../../examples/triton_gpt2/README.ipynb"
+}
diff --git a/examples/triton_gpt2/README.ipynb b/examples/triton_gpt2/README.ipynb
new file mode 100644
index 0000000000..f0437429cb
--- /dev/null
+++ b/examples/triton_gpt2/README.ipynb
@@ -0,0 +1,418 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "liked-toronto",
+   "metadata": {},
+   "source": [
+    "# Pretrained  GPT2  Model Deployment Example\n",
+    "\n",
+    "In this notebook, we will run an example of text generation using GPT2 model exported from HuggingFace and deployed with Seldon's Triton pre-packed server. the example also covers converting the model to ONNX format.\n",
+    "The implemented example below is of the Greedy approach for the next token prediction.\n",
+    "\n",
+    "more info: https://huggingface.co/transformers/model_doc/gpt2.html?highlight=gpt2\n",
+    "\n",
+    "## Steps:\n",
+    "1. Download pretrained GPT2 model from hugging face\n",
+    "2. Convert the model to ONNX\n",
+    "3. Store it in MinIo bucket\n",
+    "4. Setup Seldon-Core in your kubernetes cluster\n",
+    "5. Deploy the ONNX model with Seldon’s prepackaged Triton server.\n",
+    "6. Interact with the model, run a greedy alg example (generate sentence completion)\n",
+    "7. Clean-up\n",
+    "\n",
+    "## Basic requirements\n",
+    "* Helm v3.0.0+\n",
+    "* A Kubernetes cluster running v1.13 or above (minkube / docker-for-windows work well if enough RAM)\n",
+    "* kubectl v1.14+\n",
+    "* Python 3.6+ "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "korean-reporter",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%writefile requirements.txt\n",
+    "transformers==4.5.1\n",
+    "torch==1.8.1\n",
+    "tokenizers<0.11,>=0.10.1\n",
+    "tensorflow==2.4.1\n",
+    "tf2onnx"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "assigned-diesel",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install --trusted-host=pypi.python.org --trusted-host=pypi.org --trusted-host=files.pythonhosted.org -r requirements.txt\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "completed-evaluation",
+   "metadata": {},
+   "source": [
+    "### Export HuggingFace TFGPT2LMHeadModel pre-trained model and save it locally"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "iraqi-million",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import TFGPT2LMHeadModel, GPT2Tokenizer\n",
+    "tokenizer = GPT2Tokenizer.from_pretrained(\"gpt2\")\n",
+    "model = TFGPT2LMHeadModel.from_pretrained(\"gpt2\", from_pt=True, pad_token_id=tokenizer.eos_token_id)\n",
+    "model.save_pretrained(\"./tfgpt2model\", saved_model=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "further-tribute",
+   "metadata": {},
+   "source": [
+    "### Convert the TensorFlow saved model to ONNX"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "irish-mountain",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!python -m tf2onnx.convert --saved-model ./tfgpt2model/saved_model/1 --opset 11  --output model.onnx"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "sunset-pantyhose",
+   "metadata": {},
+   "source": [
+    "### Copy your model to a local MinIo\n",
+    "#### Setup MinIo\n",
+    "Use the provided [notebook](https://docs.seldon.io/projects/seldon-core/en/latest/examples/minio_setup.html) to install MinIo in your cluster and configure `mc` CLI tool. Instructions also [online](https://docs.min.io/docs/minio-client-quickstart-guide.html).\n",
+    "\n",
+    "-- Note: You can use your prefer remote storage server (google/ AWS etc.)\n",
+    "\n",
+    "#### Create a Bucket and store your model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "lasting-performance",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!mc mb minio-seldon/onnx-gpt2 -p\n",
+    "!mc cp ./model.onnx minio-seldon/onnx-gpt2/gpt2/1/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "convinced-syracuse",
+   "metadata": {},
+   "source": [
+    "### Run Seldon in your kubernetes cluster\n",
+    "\n",
+    "Follow the [Seldon-Core Setup notebook](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html) to Setup a cluster with Ambassador Ingress or Istio and install Seldon Core"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "backed-outreach",
+   "metadata": {},
+   "source": [
+    "### Deploy your model with Seldon pre-packaged Triton server"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "declared-crown",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Overwriting secret.yaml\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%writefile secret.yaml\n",
+    "\n",
+    "apiVersion: v1\n",
+    "kind: Secret\n",
+    "metadata:\n",
+    "  name: seldon-init-container-secret\n",
+    "type: Opaque\n",
+    "stringData:\n",
+    "  AWS_ACCESS_KEY_ID: minioadmin\n",
+    "  AWS_SECRET_ACCESS_KEY: minioadmin\n",
+    "  AWS_ENDPOINT_URL: http://minio.minio-system.svc.cluster.local:9000\n",
+    "  USE_SSL: \"false\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "beneficial-anime",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Overwriting gpt2-deploy.yaml\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%writefile gpt2-deploy.yaml\n",
+    "apiVersion: machinelearning.seldon.io/v1alpha2\n",
+    "kind: SeldonDeployment\n",
+    "metadata:\n",
+    "  name: gpt2\n",
+    "spec:\n",
+    "  predictors:\n",
+    "  - graph:\n",
+    "      implementation: TRITON_SERVER\n",
+    "      logger:\n",
+    "        mode: all\n",
+    "      modelUri: s3://onnx-gpt2\n",
+    "      envSecretRefName: seldon-init-container-secret\n",
+    "      name: gpt2\n",
+    "      type: MODEL\n",
+    "    name: default\n",
+    "    replicas: 1\n",
+    "  protocol: kfserving"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "subjective-involvement",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "secret/seldon-init-container-secret configured\n",
+      "seldondeployment.machinelearning.seldon.io/gpt2 configured\n"
+     ]
+    }
+   ],
+   "source": [
+    "!kubectl apply -f secret.yaml\n",
+    "!kubectl apply -f gpt2-deploy.yaml"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "demanding-thesaurus",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "deployment \"gpt2-default-0-gpt2\" successfully rolled out\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=gpt2 -o jsonpath='{.items[0].metadata.name}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "digital-supervisor",
+   "metadata": {},
+   "source": [
+    "#### Interact with the model: get model metadata (a \"test\" request to make sure our model is available and loaded correctly)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "id": "married-roller",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "*   Trying 127.0.0.1:80...\r\n",
+      "* TCP_NODELAY set\r\n",
+      "* Connected to localhost (127.0.0.1) port 80 (#0)\r\n",
+      "> GET /seldon/seldon/gpt2/v2/models/gpt2 HTTP/1.1\r",
+      "\r\n",
+      "> Host: localhost\r",
+      "\r\n",
+      "> User-Agent: curl/7.68.0\r",
+      "\r\n",
+      "> Accept: */*\r",
+      "\r\n",
+      "> \r",
+      "\r\n",
+      "* Mark bundle as not supporting multiuse\r\n",
+      "< HTTP/1.1 200 OK\r",
+      "\r\n",
+      "< access-control-allow-headers: Accept, Accept-Encoding, Authorization, Content-Length, Content-Type, X-CSRF-Token\r",
+      "\r\n",
+      "< access-control-allow-methods: GET,OPTIONS\r",
+      "\r\n",
+      "< access-control-allow-origin: *\r",
+      "\r\n",
+      "< content-type: application/json\r",
+      "\r\n",
+      "< seldon-puid: 7e24a20b-3130-4f50-a86b-bda5a9c4c917\r",
+      "\r\n",
+      "< x-content-type-options: nosniff\r",
+      "\r\n",
+      "< date: Fri, 16 Apr 2021 15:19:28 GMT\r",
+      "\r\n",
+      "< content-length: 336\r",
+      "\r\n",
+      "< x-envoy-upstream-service-time: 1\r",
+      "\r\n",
+      "< server: istio-envoy\r",
+      "\r\n",
+      "< \r",
+      "\r\n",
+      "* Connection #0 to host localhost left intact\r\n",
+      "{\"name\":\"gpt2\",\"versions\":[\"1\"],\"platform\":\"onnxruntime_onnx\",\"inputs\":[{\"name\":\"input_ids:0\",\"datatype\":\"INT32\",\"shape\":[-1,-1]},{\"name\":\"attention_mask:0\",\"datatype\":\"INT32\",\"shape\":[-1,-1]}],\"outputs\":[{\"name\":\"past_key_values\",\"datatype\":\"FP32\",\"shape\":[12,2,-1,12,-1,64]},{\"name\":\"logits\",\"datatype\":\"FP32\",\"shape\":[-1,-1,50257]}]}"
+     ]
+    }
+   ],
+   "source": [
+    "!curl -v http://localhost:80/seldon/seldon/gpt2/v2/models/gpt2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "anonymous-resource",
+   "metadata": {},
+   "source": [
+    "### Run prediction test: generate a sentence completion using GPT2 model  - Greedy approach\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "modified-termination",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Input: I enjoy working in Seldon\n",
+      "Output: I enjoy working in Seldon 's office , and I 'm glad to see that\n"
+     ]
+    }
+   ],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "import numpy as np\n",
+    "from transformers import GPT2Tokenizer\n",
+    "\n",
+    "tokenizer = GPT2Tokenizer.from_pretrained(\"gpt2\")\n",
+    "input_text = 'I enjoy working in Seldon'\n",
+    "count = 0\n",
+    "max_gen_len = 10\n",
+    "gen_sentence = input_text\n",
+    "while count < max_gen_len:\n",
+    "    input_ids = tokenizer.encode(gen_sentence, return_tensors='tf')\n",
+    "    shape = input_ids.shape.as_list()\n",
+    "    payload = {\n",
+    "            \"inputs\": [\n",
+    "                {\"name\": \"input_ids:0\",\n",
+    "                 \"datatype\": \"INT32\",\n",
+    "                 \"shape\": shape,\n",
+    "                 \"data\": input_ids.numpy().tolist()\n",
+    "                 },\n",
+    "                {\"name\": \"attention_mask:0\",\n",
+    "                 \"datatype\": \"INT32\",\n",
+    "                 \"shape\": shape,\n",
+    "                 \"data\": np.ones(shape, dtype=np.int32).tolist()\n",
+    "                 }\n",
+    "                ]\n",
+    "            }\n",
+    "\n",
+    "    ret = requests.post('http://localhost:80/seldon/seldon/gpt2/v2/models/gpt2/infer', json=payload)\n",
+    "\n",
+    "    try:\n",
+    "        res = ret.json()\n",
+    "    except:\n",
+    "       continue\n",
+    "\n",
+    "    # extract logits\n",
+    "    logits = np.array(res[\"outputs\"][1][\"data\"])\n",
+    "    logits = logits.reshape(res[\"outputs\"][1][\"shape\"])\n",
+    "\n",
+    "    # take the best next token probability of the last token of input ( greedy approach)\n",
+    "    next_token = logits.argmax(axis=2)[0]\n",
+    "    next_token_str = tokenizer.decode(next_token[-1:], skip_special_tokens=True,\n",
+    "                                      clean_up_tokenization_spaces=True).strip()\n",
+    "    gen_sentence += ' ' + next_token_str\n",
+    "    count += 1\n",
+    "\n",
+    "print(f'Input: {input_text}\\nOutput: {gen_sentence}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "patient-suite",
+   "metadata": {},
+   "source": [
+    "### Clean-up"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "pacific-collectible",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!kubectl delete -f gpt2-deploy.yaml"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}