diff --git a/content/patterns/rag-llm-gitops/GPU_provisioning.md b/content/patterns/rag-llm-gitops/GPU_provisioning.md index 2eabf13f0..1e3b3e41d 100644 --- a/content/patterns/rag-llm-gitops/GPU_provisioning.md +++ b/content/patterns/rag-llm-gitops/GPU_provisioning.md @@ -1,155 +1,18 @@ --- -title: GPU provisioning +title: Customize GPU provisioning nodes weight: 20 aliases: /rag-llm-gitops/gpuprovisioning/ --- -# GPU provisioning +# Customizing GPU provisioning nodes -Use the instructions to add nodes with GPU in OpenShift cluster running in AWS cloud. Nodes with GPU will be tainted to allow only pods that required GPU to be scheduled to these nodes +By default, GPU nodes use the instance type `g5.2xlarge`. If you need to change the instance type—such as to address performance requirements, carry out these steps: -More details can be found in following documents [Openshift AI](https://ai-on-openshift.io/odh-rhoai/nvidia-gpus/), [NVIDIA on OpenShift](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html) +1. In your local branch of the `rag-llm-gitops` git repository change to the `ansible/playbooks/templates` directory. -## Add machineset +2. Edit the file `gpu-machine-sets.j2` changing the `instanceType` to for example `g5.4xlarge`. Save and exit. -The easiest way is to use existing machineset manifest and update certain elements. Use worker machineset manifest and modify some of the entries (naming conventions provided as reference only, use own if required.), keep other entries as is: +3. Push the changes to the origin remote repository by running the following command: -```yaml -apiVersion: machine.openshift.io/v1beta1 -kind: MachineSet -metadata: - name: -gpu- -.............. -spec: - replicas: 1 - selector: - matchLabels: - ................ - machine.openshift.io/cluster-api-machineset: -gpu- - template: - metadata: - labels: - ........ - machine.openshift.io/cluster-api-machineset: -gpu- - spec: - ................... - metadata: - labels: - node-role.kubernetes.io/odh-notebook: '' <--- Put your label if needed - providerSpec: - value: - ........................ - instanceType: g5.2xlarge <---- Change vm type if needed - ............. - taints: - - effect: NoSchedule - key: odh-notebook <--- Use own taint name or skip all together - value: 'true' -``` - -Use `kubectl` or `oc` command line to create new machineset `oc apply -f gpu_machineset.yaml` - -Depending on type of EC2 instance creation of the new machines make take some time. Please note that all nodes with GPU will have labels(`node-role.kubernetes.io/odh-notebook`in our case) and taints (`odh-notebook `) that we have specified in machineset applied automatically - -## Install Node Feature Operator - -From OperatorHub install Node Feature Discovery Operator , accepting defaults . Once Operator has been installed , create `NodeFeatureDiscovery`instance . Use default entries unless you something specific is needed . Node Feature Discovery Operator will add labels to nodes based on available hardware resources - -## Install NVIDIA GPU Operator - -NVIDIA GPU Operator will provision daemonsets with drivers for the GPU to be used by workload running on these nodes . Detailed instructions are available in NVIDIA Documentation [NVIDIA on OpenShift](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html) . Following simplified steps for specific setup : - -- Install NVIDIA GPU Operator from OperatorHub -- Once operator is ready create `ClusterPolicy` custom resource. Unless required you can use default settings with adding `tolerations` if machineset in first section has been created with taint. Failing to add `tolerations` will prevent drivers to be installed on GPU enabled node : - -```yaml -apiVersion: nvidia.com/v1 -kind: ClusterPolicy -metadata: - name: gpu-cluster-policy -spec: - vgpuDeviceManager: - enabled: true - migManager: - enabled: true - operator: - defaultRuntime: crio - initContainer: {} - runtimeClass: nvidia - use_ocp_driver_toolkit: true - dcgm: - enabled: true - gfd: - enabled: true - dcgmExporter: - config: - name: '' - enabled: true - serviceMonitor: - enabled: true - driver: - certConfig: - name: '' - enabled: true - kernelModuleConfig: - name: '' - licensingConfig: - configMapName: '' - nlsEnabled: false - repoConfig: - configMapName: '' - upgradePolicy: - autoUpgrade: true - drain: - deleteEmptyDir: false - enable: false - force: false - timeoutSeconds: 300 - maxParallelUpgrades: 1 - maxUnavailable: 25% - podDeletion: - deleteEmptyDir: false - force: false - timeoutSeconds: 300 - waitForCompletion: - timeoutSeconds: 0 - virtualTopology: - config: '' - devicePlugin: - config: - default: '' - name: '' - enabled: true - mig: - strategy: single - sandboxDevicePlugin: - enabled: true - validator: - plugin: - env: - - name: WITH_WORKLOAD - value: 'false' - nodeStatusExporter: - enabled: true - daemonsets: - rollingUpdate: - maxUnavailable: '1' - tolerations: - - effect: NoSchedule - key: odh-notebook - value: 'true' - updateStrategy: RollingUpdate - sandboxWorkloads: - defaultWorkload: container - enabled: false - gds: - enabled: false - vgpuManager: - enabled: false - vfioManager: - enabled: true - toolkit: - enabled: true - installDir: /usr/local/nvidia -``` - -Provisioning NVIDIA daemonsets and compiling drivers may take some time (5-10 minutes) + ```sh + $ git push origin my-test-branch + ``` diff --git a/content/patterns/rag-llm-gitops/_index.md b/content/patterns/rag-llm-gitops/_index.md index 62251c4d4..8c83051eb 100644 --- a/content/patterns/rag-llm-gitops/_index.md +++ b/content/patterns/rag-llm-gitops/_index.md @@ -2,7 +2,7 @@ title: AI Generation with LLM and RAG date: 2024-07-25 tier: tested -summary: The goal of this demo is to demonstrate a Chatbot LLM application augmented with data from Red Hat product documentation running on Red Hat OpenShift. It deploys an LLM application that connects to multiple LLM providers such as OpenAI, Hugging Face, and NVIDIA NIM. The application generates a project proposal for a Red Hat product. +summary: The goal of this demo is to showcase a Chatbot LLM application augmented with data from Red Hat product documentation running on Red Hat OpenShift. It deploys an LLM application that connects to multiple LLM providers such as OpenAI, Hugging Face, and NVIDIA NIM. The application generates a project proposal for a Red Hat product. rh_products: - Red Hat OpenShift Container Platform - Red Hat OpenShift GitOps @@ -19,7 +19,7 @@ links: ci: ai --- -# Document Generation Demo with LLM and RAG +# Document generation demo with LLM and RAG ## Introduction @@ -34,16 +34,9 @@ The application uses either the [EDB Postgres for Kubernetes operator](https://c (default), or Redis, to store embeddings of Red Hat product documentation, running on Red Hat OpenShift Container Platform to generate project proposals for specific Red Hat products. -## Pre-requisites - -- Podman -- Red Hat Openshift cluster running in AWS. Supported regions are us-west-2 and us-east-1. -- GPU Node to run Hugging Face Text Generation Inference server on Red Hat OpenShift cluster. -- Create a fork of the [rag-llm-gitops](https://github.com/validatedpatterns/rag-llm-gitops.git) git repository. - ## Demo Description & Architecture -The goal of this demo is to demonstrate a Chatbot LLM application augmented with data from Red Hat product documentation running on [Red Hat OpenShift AI](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai). It deploys an LLM application that connects to multiple LLM providers such as OpenAI, Hugging Face, and NVIDIA NIM. +The goal of this demo is to showcase a Chatbot LLM application augmented with data from Red Hat product documentation running on [Red Hat OpenShift AI](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai). It deploys an LLM application that connects to multiple LLM providers such as OpenAI, Hugging Face, and NVIDIA NIM. The application generates a project proposal for a Red Hat product. ### Key Features @@ -55,6 +48,55 @@ The application generates a project proposal for a Red Hat product. - Monitoring dashboard to provide key metrics such as ratings. - GitOps setup to deploy e2e demo (frontend / vector database / served models). +#### RAG Demo Workflow + +![Overview of workflow](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/schematic-diagrams/rag-demo-vp-sd.png) + +_Figure 3. Schematic diagram for workflow of RAG demo with Red Hat OpenShift._ + + +#### RAG Data Ingestion + +![ingestion](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/schematic-diagrams/rag-demo-vp-ingress-sd.png) + +_Figure 4. Schematic diagram for Ingestion of data for RAG._ + + +#### RAG Augmented Query + + +![query](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/schematic-diagrams/rag-demo-vp-query-sd.png) + +_Figure 5. Schematic diagram for RAG demo augmented query._ + +In Figure 5, we can see RAG augmented query. The Mistral-7B model is used for +language processing. LangChain is used to integrate different tools of the LLM-based +application together and to process the PDF files and web pages. A vector +database provider such as EDB Postgres for Kubernetes (or Redis), is used to +store vectors. HuggingFace TGI is used to serve the Mistral-7B model. Gradio is +used for user interface and object storage to store language model and other +datasets. Solution components are deployed as microservices in the Red Hat +OpenShift Container Platform cluster. + +#### Download diagrams +View and download all of the diagrams above in our open source tooling site. + +[Open Diagrams](https://www.redhat.com/architect/portfolio/tool/index.html?#gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/diagrams/rag-demo-vp.drawio) + +![Diagram](/images/rag-llm-gitops/diagram-edb.png) + +_Figure 6. Proposed demo architecture with OpenShift AI_ + +### Components deployed + +- **Hugging Face Text Generation Inference Server:** The pattern deploys a Hugging Face TGIS server. The server deploys `mistral-community/Mistral-7B-v0.2` model. The server will require a GPU node. +- **EDB Postgres for Kubernetes / Redis Server:** A Vector Database server is deployed to store vector embeddings created from Red Hat product documentation. +- **Populate VectorDb Job:** The job creates the embeddings and populates the vector database. +- **LLM Application:** This is a Chatbot application that can generate a project proposal by augmenting the LLM with the Red Hat product documentation stored in vector db. +- **Prometheus:** Deploys a prometheus instance to store the various metrics from the LLM application and TGIS server. +- **Grafana:** Deploys Grafana application to visualize the metrics. + + ![Overview](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/intro-marketectures/rag-demo-vp-marketing-slide.png) _Figure 1. Overview of the validated pattern for RAG Demo with Red Hat OpenShift_ diff --git a/content/patterns/rag-llm-gitops/customize-demo-app.md b/content/patterns/rag-llm-gitops/customize-demo-app.md new file mode 100644 index 000000000..86dee73fc --- /dev/null +++ b/content/patterns/rag-llm-gitops/customize-demo-app.md @@ -0,0 +1,41 @@ +--- +title: Customize the demo application +weight: 11 +aliases: /rag-llm-gitops/getting-started/ +--- + +# Add an OpenAI provider + +You can optionally add additional providers. The application supports the following providers + +- Hugging Face +- OpenAI +- NVIDIA + +## Procedure + +1. Click the `Application box` icon in the header, and select `Retrieval-Augmented-Generation (RAG) LLM Demonstration UI` + + ![Launch Application](/images/rag-llm-gitops/launch-application-main_menu.png) + +- It should launch the application + + ![Application](/images/rag-llm-gitops/application.png) + +2. Click the `Configuration` tab to add a new provider. + +3. Click the `Add Provider` button. + + ![Addprovider](/images/rag-llm-gitops/provider-1.png) + +4. Complete the details and click the `Add` button. + + ![Addprovider](/images/rag-llm-gitops/provider-2.png) + +The provider is now available to select in the `Providers` dropdown under the `Chatbot` tab. + +![Routes](/images/rag-llm-gitops/add_provider-3.png) + +## Generate the proposal document using OpenAI provider + +Follow the instructions in the section "Generate the proposal document" in [Getting Started](/rag-llm-gitops/getting-started/) to generate the proposal document using the OpenAI provider. \ No newline at end of file diff --git a/content/patterns/rag-llm-gitops/deploying-different-db.md b/content/patterns/rag-llm-gitops/deploying-different-db.md new file mode 100644 index 000000000..167addc5e --- /dev/null +++ b/content/patterns/rag-llm-gitops/deploying-different-db.md @@ -0,0 +1,32 @@ +--- +title: Deploying a different database +weight: 12 +aliases: /rag-llm-gitops/deploy-different-db/ +--- + +# Deploying a different database + +This pattern supports two types of vector databases, EDB Postgres for Kubernetes, and Redis. By default the pattern will deploy EDB Postgres for Kubernetes as a vector database. To deploy change the global.db.type parameter to the REDIS value in your local branch in `values-global.yaml`. + +```yaml +--- +global: + pattern: rag-llm-gitops + options: + useCSV: false + syncPolicy: Automatic + installPlanApproval: Automatic +# Possible value for db.type = [REDIS, EDB] + db: + index: docs + type: EDB +# Add for model ID + model: + modelId: mistral-community/Mistral-7B-Instruct-v0.3 +main: + clusterGroupName: hub + multiSourceConfig: + enabled: true +``` + + diff --git a/content/patterns/rag-llm-gitops/getting-started.md b/content/patterns/rag-llm-gitops/getting-started.md index d6b8cdc78..a9aa0ea41 100644 --- a/content/patterns/rag-llm-gitops/getting-started.md +++ b/content/patterns/rag-llm-gitops/getting-started.md @@ -4,227 +4,168 @@ weight: 10 aliases: /rag-llm-gitops/getting-started/ --- -## Deploying the demo +## Prerequisites + +- Podman is installed on your system. +- You have the OpenShift Container Platform installation program and the pull secret for your cluster. You can get these from [Install OpenShift on AWS with installer-provisioned infrastructure](https://console.redhat.com/openshift/install/aws/installer-provisioned). +- Red Hat Openshift cluster running in AWS. + +## Procedure + +1. Create the installation configuration file using the steps described in [Creating the installation configuration file](https://docs.openshift.com/container-platform/4.17/installing/installing_aws/ipi/installing-aws-customizations.html#installation-initializing_installing-aws-customizations). + + > **Note:** + > Supported regions are `us-west-2` and `us-east-1`. For more information about installing on AWS see, [Installation methods](https://docs.openshift.com/container-platform/latest/installing/installing_aws/preparing-to-install-on-aws.html). + > + +2. Customize the generated `install-config.yaml` creating one control plane node with instance type `m5a.2xlarge` and 3 worker nodes with instance type `p3.2xlarge`. A sample YAML file is shown here: + ```yaml + additionalTrustBundlePolicy: Proxyonly + apiVersion: v1 + baseDomain: aws.validatedpatterns.io + compute: + - architecture: amd64 + hyperthreading: Enabled + name: worker + platform: + aws: + type: p3.2xlarge + replicas: 3 + controlPlane: + architecture: amd64 + hyperthreading: Enabled + name: master + platform: + aws: + type: m5a.2xlarge + replicas: 1 + metadata: + creationTimestamp: null + name: kevstestcluster + networking: + clusterNetwork: + - cidr: 10.128.0.0/14 + hostPrefix: 23 + machineNetwork: + - cidr: 10.0.0.0/16 + networkType: OVNKubernetes + serviceNetwork: + - 172.30.0.0/16 + platform: + aws: + region: us-east-1 + publish: External + pullSecret: '' + sshKey: | + ssh-ed25519 someuser@redhat.com + ``` + +3. Fork the [rag-llm-gitops](https://github.com/validatedpatterns/rag-llm-gitops.git) git repository. + +4. Clone the forked repository by running the following command: + + ```sh + $ git clone git@github.com:your-username/rag-llm-gitops.git + ``` +5. Go to your repository: Ensure you are in the root directory of your git repository by using the following command: + + ```sh + $ cd rag-llm-gitops + ``` +6. Create a local copy of the secret values file by running the following command: + + ```sh + $ cp values-secret.yaml.template ~/values-secret-rag-llm-gitops.yaml + ``` + > **Note:** + >For this demo, editing this file is unnecessary as the default configuration works out of the box upon installation. + +7. Add the remote upstream repository by running the following command: + + ```sh + $ git remote add -f upstream git@github.com:validatedpatterns/rag-llm-gitops.git + ``` +8. Create a local branch by running the following command: + + ```sh + $ git checkout -b my-test-branch main + ``` + +9. By default the pattern deploys the EDB Postgres for Kubernetes as a vector database. To deploy Redis, change the `global.db.type` parameter to the `REDIS` value in your local branch in `values-global.yaml`. For more information see, [Deploying a different databases](/rag-llm-gitops/deploy-different-db/) to change the vector database. + +10. By default instance types for the GPU nodes are `g5.2xlarge`. Follow the [Customize GPU provisioning nodes](/rag-llm-gitops/gpuprovisioning/) to change the GPU instance types. + +11. Run the following command to push `my-test-branch` (including any changes) to the origin remote repository: + + ```sh + $ git push origin my-test-branch + ``` +12. Ensure you have logged in to the cluster at both command line and the console by using the login credentials presented to you when you installed the cluster. For example: + + ```sh + INFO Install complete! + INFO Run 'export KUBECONFIG=/auth/kubeconfig' to manage the cluster with 'oc', the OpenShift CLI. + INFO The cluster is ready when 'oc login -u kubeadmin -p ' succeeds (wait a few minutes). + INFO Access the OpenShift web-console here: https://console-openshift-console.apps.demo1.openshift4-beta-abcorp.com + INFO Login to the console with user: kubeadmin, password: + ``` +13. Add GPU nodes to your existing cluster deployment by running the following command: + + ```sh + $ ./pattern.sh make create-gpu-machineset + ``` + > **Note:** + > You may need to create a file `config` in your home directory and populate it with the region name. + > 1. Run the following: + > ```sh + > vi ~/.aws/config + > ``` + > 2. Add the following: + > ```sh + > [default] + > region = us-east-1 + > ``` + +14. Adding the GPU nodes should take about 5-10 minutes. You can verify the addition of these `g5.2xlarge` nodes in the OpenShift web console under **Compute** > **Nodes**. + +15. Install the pattern with the demo application by running the following command: + + ```sh + $ ./pattern.sh make install + ``` + + > **Note:** + > This deploys everything you need to run the demo application including the Nividia GPU Operator and the Node Feature Discovery Operator used to determine your GPU nodes. + > + +## Verify the Installation + +1. In the OpenShift web console go to the **Workloads** > **Pods** menu. + +2. Select the `rag-llm` project from the drop down. + +3. Following pods should be up and running. + + ![Pods](/images/rag-llm-gitops/rag-llm.png) -Following commands will take about 15-20 minutes ->**Validated pattern will be deployed** - -```sh -git clone https://github.com/<>/rag-llm-gitops.git -cd rag-llm-gitops -oc login --token=<> --server=<> # login to Openshift cluster -podman machine start -# Copy values-secret.yaml.template to ~/values-secret-rag-llm-gitops.yaml. -# You should never check these files -# Add secrets to the values-secret.yaml that needs to be added to the vault. -cp values-secret.yaml.template ~/values-secret-rag-llm-gitops.yaml -./pattern.sh make install -``` - -#### RAG Demo Workflow - -![Overview of workflow](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/schematic-diagrams/rag-demo-vp-sd.png) - -_Figure 3. Schematic diagram for workflow of RAG demo with Red Hat OpenShift._ - - -#### RAG Data Ingestion - -![ingestion](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/schematic-diagrams/rag-demo-vp-ingress-sd.png) - -_Figure 4. Schematic diagram for Ingestion of data for RAG._ - - -#### RAG Augmented Query - - -![query](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/schematic-diagrams/rag-demo-vp-query-sd.png) - -_Figure 5. Schematic diagram for RAG demo augmented query._ - -In Figure 5, we can see RAG augmented query. Community version of [Mistral-7B-Instruct](https://huggingface.co/mistral-community/Mistral-7B-Instruct-v0.3) model is used for language processing. LangChain is used to integrate different tools of the LLM-based -application together and to process the PDF files and web pages. A vector -database provider such as EDB Postgres for Kubernetes (or Redis), is used to -store vectors. [Red Hat OpenShift AI](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai) to serve the [Mistral-7B-Instruct](https://huggingface.co/mistral-community/Mistral-7B-Instruct-v0.3) model. Gradio is -used for user interface and object storage to store language model and other -datasets. Solution components are deployed as microservices in the Red Hat -OpenShift Container Platform cluster. - -#### Download diagrams -View and download all of the diagrams above in our open source tooling site. - -[Open Diagrams](https://www.redhat.com/architect/portfolio/tool/index.html?#gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/diagrams/rag-demo-vp.drawio) - -![Diagram](/images/rag-llm-gitops/diagram-edb.png) - -_Figure 6. Proposed demo architecture with OpenShift AI_ - -### Components deployed - -- **vLLM Text Generation Inference Server:** The pattern deploys a vLLM Inference Server. The server deploys and serves `mistral-community/Mistral-7B-Instruct-v0.3` model. The server will require a GPU node. -- **EDB Postgres for Kubernetes / Redis Server:** A Vector Database server is deployed to store vector embeddings created from Red Hat product documentation. -- **Populate VectorDb Job:** The job creates the embeddings and populates the vector database. -- **LLM Application:** This is a Chatbot application that can generate a project proposal by augmenting the LLM with the Red Hat product documentation stored in vector db. -- **Prometheus:** Deploys a prometheus instance to store the various metrics from the LLM application and TGIS server. -- **Grafana:** Deploys Grafana application to visualize the metrics. - -## Deploying the demo - -To run the demo, ensure the Podman is running on your machine.Fork the [rag-llm-gitops](https://github.com/validatedpatterns/rag-llm-gitops) repo into your organization -### Login to OpenShift cluster - -Replace the token and the api server url in the command below to login to the OpenShift cluster. - -```sh -oc login --token= --server= # login to Openshift cluster -``` - -### Cloning repository - -```sh -git clone https://github.com/<>/rag-llm-gitops.git -cd rag-llm-gitops -``` - -### Configuring model - -This pattern deploys community version of [Mistral-7B-Instruct](https://huggingface.co/mistral-community/Mistral-7B-Instruct-v0.3) out of box. Run the following command to configure vault with the model Id. - -```sh -# Copy values-secret.yaml.template to ~/values-secret-rag-llm-gitops.yaml. -# You should never check-in these files -# Add secrets to the values-secret.yaml that needs to be added to the vault. -cp values-secret.yaml.template ~/values-secret-rag-llm-gitops.yaml -``` - -To deploy a non-community [Mistral-7b-Instruct](https://huggingface.co/mistralai/) model, grab the [Hugging Face token](https://huggingface.co/settings/tokens) and accept the terms and conditions on the model page. Edit ~/values-secret-rag-llm-gitops.yaml to replace the `model Id` and the `Hugging Face` token. - -```sh -secrets: - - name: hfmodel - fields: - - name: hftoken - value: null - - name: modelId - value: "mistral-community/Mistral-7B-Instruct-v0.3" - - name: minio - fields: - - name: MINIO_ROOT_USER - value: minio - - name: MINIO_ROOT_PASSWORD - value: null - onMissingValue: generate -``` - -### Provision GPU MachineSet - -As a pre-requisite to deploy the application using the validated pattern, GPU nodes should be provisioned along with Node Feature Discovery Operator and NVIDIA GPU operator. To provision GPU Nodes - -Following command will take about 5-10 minutes. - -```sh -./pattern.sh make create-gpu-machineset -``` - -Wait till the nodes are provisioned and running. - -![Diagram](/images/rag-llm-gitops/nodes.png) - -Alternatively, follow the [instructions](../gpu_provisioning) to manually install GPU nodes, Node Feature Discovery Operator and NVIDIA GPU operator. - -### Deploy application - -**Note**: This pattern supports two types of vector databases, EDB Postgres for Kubernetes, and Redis. By default the pattern will deploy EDB Postgres for Kubernetes as a vector database. To deploy Redis, change the `global.db.type` parameter to the `REDIS` value in [values-global.yaml](./values-global.yaml). - -```yaml ---- -global: - pattern: rag-llm-gitops - options: - useCSV: false - syncPolicy: Automatic - installPlanApproval: Automatic -# Possible value for db.type = [REDIS, EDB] - db: - index: docs - type: EDB # <--- Default is EDB, Change the db type to REDIS for Redis deployment -main: - clusterGroupName: hub - multiSourceConfig: - enabled: true -``` - -Following commands will take about 15-20 minutes - -> **Validated pattern will be deployed** - -```sh -./pattern.sh make install -``` - -### 1: Verify the installation - -- Login to the OpenShift web console. -- Navigate to the Workloads --> Pods. -- Select the `rag-llm` project from the drop down. -- Following pods should be up and running. - -![Pods](/images/rag-llm-gitops/rag-llm.png) - -Note: If the hf-text-generation-server is not running, make sure you have followed the steps to configure a node with GPU from the [instructions](../gpu_provisioning) provided above. - -### 2: Launch the application +### Launch the application - Click the `Application box` icon in the header, and select `Retrieval-Augmented-Generation (RAG) LLM Demonstration UI` -![Launch Application](/images/rag-llm-gitops/launch-application.png) + ![Launch Application](/images/rag-llm-gitops/launch-application-main_menu.png) - It should launch the application ![Application](/images/rag-llm-gitops/application.png) -### 3: Generate the proposal document +### Generate the proposal document -- It will use the default provider and model configured as part of the application deployment. The default provider is a Hugging Face model server running in the OpenShift. The model server is deployed with this validated pattern and requires a node with GPU. -- Enter any company name -- Enter the product as `RedHat OpenShift` +- The demo generates a proposal document using the default provider `Mistral-7B-Instruct`; a model available on Hugging Face. It is a fine-tuned version of the base `Mistral-7B` model. + +- Enter any company name for example `Microsoft`. +- Enter the product as `RedHat OpenShift AI` - Click the `Generate` button, a project proposal should be generated. The project proposal also contains the reference of the RAG content. The project proposal document can be Downloaded in the form of a PDF document. ![Routes](/images/rag-llm-gitops/proposal.png) -### 4: Add an OpenAI provider - -You can optionally add additional providers. The application supports the following providers - -- Hugging Face Text Generation Inference Server -- OpenAI -- NVIDIA - -Click on the `Add Provider` tab to add a new provider. Fill in the details and click `Add Provider` button. The provider should be added in the `Providers` dropdown under `Chatbot` tab. - -![Routes](/images/rag-llm-gitops/add_provider.png) - -### 5: Generate the proposal document using OpenAI provider - -Follow the instructions in step 3 to generate the proposal document using the OpenAI provider. - -![Routes](/images/rag-llm-gitops/chatgpt.png) - -### 6: Rating the provider - -You can provide rating to the model by clicking on the `Rate the model` radio button. The rating will be captured as part of the metrics and can help the company which model to deploy in production. - -### 7: Grafana Dashboard - -By default, Grafana application is deployed in `llm-monitoring` namespace.To launch the Grafana Dashboard, follow the instructions below: - -- Grab the credentials of Grafana Application - - Navigate to Workloads --> Secrets - - Click on the grafana-admin-credentials and copy the GF_SECURITY_ADMIN_USER, GF_SECURITY_ADMIN_PASSWORD -- Launch Grafana Dashboard - - Click the `Application box` icon in the header, and select `Grafana UI for LLM ratings` - ![Launch Application](/images/rag-llm-gitops/launch-application.png) - - Enter the Grafana admin credentials. - - Ratings are displayed for each model. -![Routes](/images/rag-llm-gitops/monitoring.png) diff --git a/content/patterns/rag-llm-gitops/rating-the-provider.md b/content/patterns/rag-llm-gitops/rating-the-provider.md new file mode 100644 index 000000000..cf8270b90 --- /dev/null +++ b/content/patterns/rag-llm-gitops/rating-the-provider.md @@ -0,0 +1,29 @@ +--- +title: Rating the provider +weight: 11 +aliases: /rag-llm-gitops/rating-provider/ +--- + +# Rating the provider + +You can provide rating to the model by clicking on the `Rate the model` radio button. The rating are captured as part of the metrics and can help the company decide which model to deploy in production. + +## Grafana Dashboard + +By default, the Grafana application is deployed in the `llm-monitoring` namespace. You can track the ratings by logging in to the Grafana Dashboard by following the steps below. + +1. In the OpenShift web console go to **Workloads** > **Secrets**. + +2. Click on the `ai-llm-grafana-admin-credentials` scroll down. + +3. Launch Grafana Dashboard by clicking the `Application box` icon in the header, and select `Grafana UI for LLM ratings`. + +4. In the top right hand corner click `Sign in` + + ![Launch Application](/images/rag-llm-gitops/launch-application.png) + +5. Enter the Grafana admin credentials. Copy the `GF_SECURITY_ADMIN_USER`, `GF_SECURITY_ADMIN_PASSWORD` from `ai-llm-grafana-admin-credentials` screen in the OpenShift web console. + +6. Ratings are displayed for each model. + + ![Routes](/images/rag-llm-gitops/monitoring.png) \ No newline at end of file diff --git a/static/images/rag-llm-gitops/add_provider-1.png b/static/images/rag-llm-gitops/add_provider-1.png new file mode 100644 index 000000000..375f7e97d Binary files /dev/null and b/static/images/rag-llm-gitops/add_provider-1.png differ diff --git a/static/images/rag-llm-gitops/add_provider-2.png b/static/images/rag-llm-gitops/add_provider-2.png new file mode 100644 index 000000000..bfc20cbfd Binary files /dev/null and b/static/images/rag-llm-gitops/add_provider-2.png differ diff --git a/static/images/rag-llm-gitops/add_provider-3.png b/static/images/rag-llm-gitops/add_provider-3.png new file mode 100644 index 000000000..f7ad256c7 Binary files /dev/null and b/static/images/rag-llm-gitops/add_provider-3.png differ diff --git a/static/images/rag-llm-gitops/add_provider.png b/static/images/rag-llm-gitops/add_provider.png index 434f5d061..89a2b656f 100644 Binary files a/static/images/rag-llm-gitops/add_provider.png and b/static/images/rag-llm-gitops/add_provider.png differ diff --git a/static/images/rag-llm-gitops/application.png b/static/images/rag-llm-gitops/application.png index a96317e60..1a12bd46a 100644 Binary files a/static/images/rag-llm-gitops/application.png and b/static/images/rag-llm-gitops/application.png differ diff --git a/static/images/rag-llm-gitops/launch-application-main_menu.png b/static/images/rag-llm-gitops/launch-application-main_menu.png new file mode 100644 index 000000000..b0f5a060f Binary files /dev/null and b/static/images/rag-llm-gitops/launch-application-main_menu.png differ diff --git a/static/images/rag-llm-gitops/launch-application.png b/static/images/rag-llm-gitops/launch-application.png index 51ce74ba4..72c35eb1e 100644 Binary files a/static/images/rag-llm-gitops/launch-application.png and b/static/images/rag-llm-gitops/launch-application.png differ diff --git a/static/images/rag-llm-gitops/monitoring.png b/static/images/rag-llm-gitops/monitoring.png index 58fc6b8c1..f0c6b669b 100644 Binary files a/static/images/rag-llm-gitops/monitoring.png and b/static/images/rag-llm-gitops/monitoring.png differ diff --git a/static/images/rag-llm-gitops/proposal.png b/static/images/rag-llm-gitops/proposal.png index 60a1face9..349d12ecb 100644 Binary files a/static/images/rag-llm-gitops/proposal.png and b/static/images/rag-llm-gitops/proposal.png differ diff --git a/static/images/rag-llm-gitops/provider-1.png b/static/images/rag-llm-gitops/provider-1.png new file mode 100644 index 000000000..ff45fef3c Binary files /dev/null and b/static/images/rag-llm-gitops/provider-1.png differ diff --git a/static/images/rag-llm-gitops/provider-2.png b/static/images/rag-llm-gitops/provider-2.png new file mode 100644 index 000000000..e9d0d276d Binary files /dev/null and b/static/images/rag-llm-gitops/provider-2.png differ diff --git a/static/images/rag-llm-gitops/rag-llm.png b/static/images/rag-llm-gitops/rag-llm.png index 168edf1a1..7e39f70fc 100644 Binary files a/static/images/rag-llm-gitops/rag-llm.png and b/static/images/rag-llm-gitops/rag-llm.png differ