This application provides an interactive chatbot interface to analyze and ask questions about a specific GitHub repository.
It helps teams understand and adapt to large and unfamiliar codebases by offering specialized insight into repositories.
To get this running, jump straight to installation.
A chat assistant that allows users to ask questions about a GitHub repo (e.g., “How do I deploy this?” or “Explain this function”), using RAG (Retrieval-Augmented Generation) with LLMs.
What is RAG? RAG enhances LLMs by retrieving relevant information from an external knowledge base to ground its responses, making them more accurate and up-to-date.
- Chat assistant with memory and knowledge of provided repository
- Sources tab to view referenced files
- Detailed directory tree visualization
- Streamlit App: Chat interface, repository visualization
- Llamastack: API layer for agents, tooling, and vector IO
- LLM: Deployed on OpenShift AI via vLLM
- PostgreSQL + PGVector: Stores embeddings and performs similarity search for RAG
This kickstart supports two modes of deployment:
- OpenShift Cluster 4.16+ with OpenShift AI
- OpenShift Client CLI -
oc - Helm CLI -
helm - 1 GPU with 24GB of VRAM for the LLM
- Hugging face Token
- Access to Meta Llama model
- Installed
yq
- Login to your OpenShift Cluster
oc login --server="<cluster-api-endpoint>" --token="sha256~XYZ"- If the GPU nodes are tainted, find the taint key. You will have to pass in the
make command to ensure that the llm pods are deployed on the tainted nodes with GPU.
In the example below the key for the taint is
nvidia.com/gpu
oc get nodes -o yaml | grep -A 3 taintThe output of the command will be something like below
taints:
- effect: NoSchedule
key: nvidia.com/gpu
value: "true"
--
taints:
- effect: NoSchedule
key: nvidia.com/gpu
value: "true"
You can work with your OpenShift cluster admin team to determine what labels and taints identify GPU-enabled worker nodes. It is also possible that all your worker nodes have GPUs therefore have no distinguishing taint.
- Navigate to Helm deploy directory
cd deploy/helm- Run the install command
make install NAMESPACE=github-rag-assistant LLM=llama-3-2-3b-instruct LLM_TOLERATION="nvidia.com/gpu" - Access the route once all services have been deployed to access the Streamlit UI
oc get routeOllama installed with the minilm and llama3.1:8b-instruct-fp16 models:
ollama pull all-minilm
ollama pull llama3.1:8b-instruct-fp16 # Can use any Ollama model with tool functionalityNote that the local deployment uses an in-memory FAISS database instead of PostgreSQL + PGVector. This is configured in the default Llamastack Ollama template.
- Make sure that
uvis installed
uv --versionIf not, install using pip
pip install uv- Install the dependencies
uv sync- Make sure Ollama is running
ollama run llama3.1:8b-instruct-fp16 # Or start the Ollama application- Start the LlamaStack server
source .venv/bin/activate
INFERENCE_MODEL=llama3.1:8b-instruct-fp16 llama stack build --template ollama --image-type venv --run- Run the Streamlit app
uv run streamlit run app.pyHelm Chart designs adapted from RAG Blueprint

