validatedpatterns · kquinn1204 · Jan 22, 2025 · Dec 5, 2024 · Dec 5, 2024 · Dec 6, 2024
diff --git a/content/patterns/rag-llm-gitops/GPU_provisioning.md b/content/patterns/rag-llm-gitops/GPU_provisioning.md
@@ -1,155 +1,18 @@
 ---
-title: GPU provisioning
+title: Customize GPU provisioning nodes
 weight: 20
 aliases: /rag-llm-gitops/gpuprovisioning/
 ---
-# GPU provisioning
+# Customizing GPU provisioning nodes
 
-Use the instructions to add nodes with GPU in OpenShift cluster running in AWS cloud. Nodes with GPU will be tainted to allow only pods that required GPU to be scheduled to these nodes
+By default, GPU nodes use the instance type `g5.2xlarge`. If you need to change the instance type—such as to address performance requirements, carry out these steps: 
 
-More details can be found in following documents [Openshift AI](https://ai-on-openshift.io/odh-rhoai/nvidia-gpus/), [NVIDIA on OpenShift](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html)
+1. In your local branch of the `rag-llm-gitops` git repository change to the `ansible/playbooks/templates` directory. 
 
-## Add machineset
+2. Edit the file `gpu-machine-sets.j2` changing the `instanceType` to for example `g5.4xlarge`. Save and exit. 
 
-The easiest way is to use existing machineset manifest and update certain elements. Use worker machineset manifest and modify some of the entries (naming conventions provided as reference only, use own if required.), keep other entries as is:
+3. Push the changes to the origin remote repository by running the following command: 
 
-```yaml
-apiVersion: machine.openshift.io/v1beta1
-kind: MachineSet
-metadata:
-  name: <clustername>-gpu-<AWSregion>
-..............
-spec:
-  replicas: 1
-  selector:
-    matchLabels:
-      ................
-      machine.openshift.io/cluster-api-machineset: <clustername>-gpu-<AWSregion>
-  template:
-    metadata:
-      labels:
-        ........
-        machine.openshift.io/cluster-api-machineset: <clustername>-gpu-<AWSregion>
-    spec:
-      ...................
-      metadata:
-        labels:
-          node-role.kubernetes.io/odh-notebook: '' <--- Put your label if needed
-      providerSpec:
-        value:
-          ........................
-          instanceType: g5.2xlarge <---- Change vm type if needed
-      .............
-      taints:
-        - effect: NoSchedule
-          key: odh-notebook  <--- Use own taint name or skip all together
-          value: 'true'
-```
-
-Use `kubectl` or `oc` command line to create new machineset `oc apply -f gpu_machineset.yaml`
-
-Depending on type of EC2 instance creation of the new machines make take some time. Please note that all nodes with GPU will have labels(`node-role.kubernetes.io/odh-notebook`in our case) and taints (`odh-notebook `) that we have specified in machineset applied automatically
-
-## Install Node Feature Operator
-
-From OperatorHub install Node Feature Discovery Operator , accepting defaults . Once Operator has been installed , create `NodeFeatureDiscovery`instance . Use default entries unless you something specific is needed . Node Feature Discovery Operator will add labels to nodes based on available hardware resources
-
-## Install NVIDIA GPU Operator
-
-NVIDIA GPU Operator will provision daemonsets with drivers for the GPU to be used by workload running on these nodes . Detailed instructions are available in NVIDIA Documentation [NVIDIA on OpenShift](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html) .  Following simplified steps for specific setup :
-
-- Install NVIDIA GPU Operator from OperatorHub
-- Once operator is ready create `ClusterPolicy` custom resource. Unless required you can use default settings with adding `tolerations` if machineset in first section has been created with taint. Failing to add `tolerations` will prevent drivers to be installed on GPU enabled node :
-
-```yaml
-apiVersion: nvidia.com/v1
-kind: ClusterPolicy
-metadata:
-  name: gpu-cluster-policy
-spec:
-  vgpuDeviceManager:
-    enabled: true
-  migManager:
-    enabled: true
-  operator:
-    defaultRuntime: crio
-    initContainer: {}
-    runtimeClass: nvidia
-    use_ocp_driver_toolkit: true
-  dcgm:
-    enabled: true
-  gfd:
-    enabled: true
-  dcgmExporter:
-    config:
-      name: ''
-    enabled: true
-    serviceMonitor:
-      enabled: true
-  driver:
-    certConfig:
-      name: ''
-    enabled: true
-    kernelModuleConfig:
-      name: ''
-    licensingConfig:
-      configMapName: ''
-      nlsEnabled: false
-    repoConfig:
-      configMapName: ''
-    upgradePolicy:
-      autoUpgrade: true
-      drain:
-        deleteEmptyDir: false
-        enable: false
-        force: false
-        timeoutSeconds: 300
-      maxParallelUpgrades: 1
-      maxUnavailable: 25%
-      podDeletion:
-        deleteEmptyDir: false
-        force: false
-        timeoutSeconds: 300
-      waitForCompletion:
-        timeoutSeconds: 0
-    virtualTopology:
-      config: ''
-  devicePlugin:
-    config:
-      default: ''
-      name: ''
-    enabled: true
-  mig:
-    strategy: single
-  sandboxDevicePlugin:
-    enabled: true
-  validator:
-    plugin:
-      env:
-        - name: WITH_WORKLOAD
-          value: 'false'
-  nodeStatusExporter:
-    enabled: true
-  daemonsets:
-    rollingUpdate:
-      maxUnavailable: '1'
-    tolerations:
-      - effect: NoSchedule
-        key: odh-notebook
-        value: 'true'
-    updateStrategy: RollingUpdate
-  sandboxWorkloads:
-    defaultWorkload: container
-    enabled: false
-  gds:
-    enabled: false
-  vgpuManager:
-    enabled: false
-  vfioManager:
-    enabled: true
-  toolkit:
-    enabled: true
-    installDir: /usr/local/nvidia
-```
-
-Provisioning NVIDIA daemonsets and compiling drivers may take some time (5-10 minutes)
+   ```sh
+   $ git push origin my-test-branch
+   ```
diff --git a/content/patterns/rag-llm-gitops/_index.md b/content/patterns/rag-llm-gitops/_index.md
@@ -2,7 +2,7 @@
 title: AI Generation with LLM and RAG
 date: 2024-07-25
 tier: tested
-summary: The goal of this demo is to demonstrate a Chatbot LLM application augmented with data from Red Hat product documentation running on Red Hat OpenShift. It deploys an LLM application that connects to multiple LLM providers such as OpenAI, Hugging Face, and NVIDIA NIM. The application generates a project proposal for a Red Hat product.
+summary: The goal of this demo is to showcase a Chatbot LLM application augmented with data from Red Hat product documentation running on Red Hat OpenShift. It deploys an LLM application that connects to multiple LLM providers such as OpenAI, Hugging Face, and NVIDIA NIM. The application generates a project proposal for a Red Hat product.
 rh_products:
 - Red Hat OpenShift Container Platform
 - Red Hat OpenShift GitOps
@@ -19,7 +19,7 @@ links:
 ci: ai
 ---
 
-# Document Generation Demo with LLM and RAG
+# Document generation demo with LLM and RAG
 
 ## Introduction
 
@@ -34,16 +34,9 @@ The application uses either the [EDB Postgres for Kubernetes operator](https://c
 (default), or Redis, to store embeddings of Red Hat product documentation, running on Red Hat
 OpenShift Container Platform to generate project proposals for specific Red Hat products.
 
-## Pre-requisites
-
-- Podman
-- Red Hat Openshift cluster running in AWS. Supported regions are us-west-2 and us-east-1.
-- GPU Node to run Hugging Face Text Generation Inference server on Red Hat OpenShift cluster.
-- Create a fork of the [rag-llm-gitops](https://github.com/validatedpatterns/rag-llm-gitops.git) git repository.
-
 ## Demo Description & Architecture
 
-The goal of this demo is to demonstrate a Chatbot LLM application augmented with data from Red Hat product documentation running on [Red Hat OpenShift AI](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai). It deploys an LLM application that connects to multiple LLM providers such as OpenAI, Hugging Face, and NVIDIA NIM.
+The goal of this demo is to showcase a Chatbot LLM application augmented with data from Red Hat product documentation running on [Red Hat OpenShift AI](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai). It deploys an LLM application that connects to multiple LLM providers such as OpenAI, Hugging Face, and NVIDIA NIM.
 The application generates a project proposal for a Red Hat product.
 
 ### Key Features
@@ -55,6 +48,55 @@ The application generates a project proposal for a Red Hat product.
 - Monitoring dashboard to provide key metrics such as ratings.
 - GitOps setup to deploy e2e demo (frontend / vector database / served models).
 
+#### RAG Demo Workflow
+
+![Overview of workflow](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/schematic-diagrams/rag-demo-vp-sd.png)
+
+_Figure 3. Schematic diagram for workflow of RAG demo with Red Hat OpenShift._
+
+
+#### RAG Data Ingestion
+
+![ingestion](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/schematic-diagrams/rag-demo-vp-ingress-sd.png)
+
+_Figure 4. Schematic diagram for Ingestion of data for RAG._
+
+
+#### RAG Augmented Query
+
+
+![query](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/schematic-diagrams/rag-demo-vp-query-sd.png)
+
+_Figure 5. Schematic diagram for RAG demo augmented query._
+
+In Figure 5, we can see RAG augmented query. The Mistral-7B model is used for
+language processing. LangChain is used to integrate different tools of the LLM-based
+application together and to process the PDF files and web pages. A vector
+database provider such as EDB Postgres for Kubernetes (or Redis), is used to
+store vectors. HuggingFace TGI is used to serve the Mistral-7B model. Gradio is
+used for user interface and object storage to store language model and other
+datasets. Solution components are deployed as microservices in the Red Hat
+OpenShift Container Platform cluster.
+
+#### Download diagrams
+View and download all of the diagrams above in our open source tooling site.
+
+[Open Diagrams](https://www.redhat.com/architect/portfolio/tool/index.html?#gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/diagrams/rag-demo-vp.drawio)
+
+![Diagram](/images/rag-llm-gitops/diagram-edb.png)
+
+_Figure 6. Proposed demo architecture with OpenShift AI_
+
+### Components deployed
+
+- **Hugging Face Text Generation Inference Server:** The pattern deploys a Hugging Face TGIS server. The server deploys `mistral-community/Mistral-7B-v0.2` model. The server will require a GPU node.
+- **EDB Postgres for Kubernetes / Redis Server:** A Vector Database server is deployed to store vector embeddings created from Red Hat product documentation.
+- **Populate VectorDb Job:** The job creates the embeddings and populates the vector database.
+- **LLM Application:** This is a Chatbot application that can generate a project proposal by augmenting the LLM with the Red Hat product documentation stored in vector db.
+- **Prometheus:** Deploys a prometheus instance to store the various metrics from the LLM application and TGIS server.
+- **Grafana:** Deploys Grafana application to visualize the metrics.
+
+
 ![Overview](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/intro-marketectures/rag-demo-vp-marketing-slide.png)
 
 _Figure 1. Overview of the validated pattern for RAG Demo with Red Hat OpenShift_

diff --git a/content/patterns/rag-llm-gitops/customize-demo-app.md b/content/patterns/rag-llm-gitops/customize-demo-app.md
@@ -0,0 +1,41 @@
+---
+title: Customize the demo application
+weight: 11
+aliases: /rag-llm-gitops/getting-started/
+---
+
+# Add an OpenAI provider
+
+You can optionally add additional providers. The application supports the following providers
+
+- Hugging Face
+- OpenAI
+- NVIDIA
+
+## Procedure
+
+1. Click the `Application box` icon in the header, and select `Retrieval-Augmented-Generation (RAG) LLM Demonstration UI`
+
+   ![Launch Application](/images/rag-llm-gitops/launch-application-main_menu.png)
+
+- It should launch the application
+
+  ![Application](/images/rag-llm-gitops/application.png)
+
+2. Click the `Configuration` tab to add a new provider. 
+
+3. Click the `Add Provider` button.
+
+   ![Addprovider](/images/rag-llm-gitops/provider-1.png)
+
+4. Complete the details and click the `Add` button. 
+
+   ![Addprovider](/images/rag-llm-gitops/provider-2.png)
+
+The provider is now available to select in the `Providers` dropdown under the `Chatbot` tab.
+
+![Routes](/images/rag-llm-gitops/add_provider-3.png)
+
+## Generate the proposal document using OpenAI provider
+
+Follow the instructions in the section "Generate the proposal document" in [Getting Started](/rag-llm-gitops/getting-started/) to generate the proposal document using the OpenAI provider.
diff --git a/content/patterns/rag-llm-gitops/deploying-different-db.md b/content/patterns/rag-llm-gitops/deploying-different-db.md
@@ -0,0 +1,32 @@
+---
+title: Deploying a different database 
+weight: 12
+aliases: /rag-llm-gitops/deploy-different-db/
+---
+
+# Deploying a different database
+
+This pattern supports two types of vector databases, EDB Postgres for Kubernetes, and Redis. By default the pattern will deploy EDB Postgres for Kubernetes as a vector database. To deploy change the global.db.type parameter to the REDIS value in your local branch in `values-global.yaml`.
+
+```yaml
+---
+global:
+  pattern: rag-llm-gitops
+  options:
+    useCSV: false
+    syncPolicy: Automatic
+    installPlanApproval: Automatic
+# Possible value for db.type = [REDIS, EDB]
+  db:
+    index: docs
+    type: EDB
+# Add for model ID
+  model:
+      modelId: mistral-community/Mistral-7B-Instruct-v0.3
+main:
+  clusterGroupName: hub
+  multiSourceConfig:
+    enabled: true
+```
+
+