An intelligent AI-powered assistant that allows developers to interact with their codebase using natural language.
Features โข Demo โข Tech Stack โข Quick Start โข Architecture
Codebase RAG is a production-ready Retrieval-Augmented Generation (RAG) system that enables developers to:
- ๐ฌ Chat with their codebase using natural language
- ๐ Semantically search across thousands of code files
- ๐ค Get AI-powered explanations of complex code
- ๐ Visualize codebase insights with interactive dashboards
- โก Lightning-fast queries with 11ms average response time
Built with modern ML techniques including vector embeddings, semantic search, and Google's Gemini 2.5 Flash LLM.
- Natural Language Queries: Ask questions in plain English about your codebase
- Semantic Code Search: Find relevant code using meaning, not just keywords
- AI-Powered Explanations: Get detailed explanations of how code works
- Multi-Language Support: Python, JavaScript, Java, C++, Go, and more
- Real-time Indexing: Automatically updates as your codebase changes
- 4,364+ code chunks indexed with FAISS vector database
- 11ms average query response time
- 45% test coverage with 21/21 tests passing
- Production-ready with comprehensive error handling
- Modern, responsive design with smooth animations
- Interactive dashboard with real-time metrics
- Code syntax highlighting for better readability
- Query history to track your interactions
User: "How does Flask routing work in this codebase?"
AI: "In this codebase, Flask routing is implemented using the @app.route()
decorator to map URL paths to Python functions. The routing system handles
incoming HTTP requests by matching the URL pattern and executing the
corresponding view function..."
- ๐ฌ Natural conversations about code functionality
- ๐ Ingest repositories with one command
- ๐ก Explain code snippets interactively
- ๐ View analytics on indexed codebase
- FastAPI - Modern Python web framework
- LangChain - LLM application framework
- FAISS - Facebook AI Similarity Search (vector database)
- Google Gemini 2.5 Flash - State-of-the-art LLM
- Tree-sitter - Code parsing and AST generation
- Streamlit - Interactive web interface
- Plotly - Data visualization
- Custom CSS - Modern gradient designs
- Python 3.12+ - Modern Python features
- Pytest - Comprehensive testing
- Docker - Containerization (optional)
- Git - Version control
- Python 3.12 or higher
- Git
- Google Gemini API key (free at Google AI Studio)
- Clone the repository
git clone https://github.com/YOUR_USERNAME/codebase-rag.git
cd codebase-rag- Create virtual environment
python3 -m venv codebase-rag-env
source codebase-rag-env/bin/activate # On Windows: codebase-rag-env\Scripts\activate- Install dependencies
pip install -r requirements.txt- Configure API keys
# Copy example environment file
cp .env.example .env
# Edit .env and add your Gemini API key
# GEMINI_API_KEY=your_api_key_here- Run the system
# Terminal 1: Start API server
python scripts/run_api.py
# Terminal 2: Start frontend
streamlit run frontend/app.py- Open in browser
Frontend: http://localhost:8501
API Docs: http://localhost:8000/docs
codebase-rag/
โโโ backend/
โ โโโ api/ # FastAPI REST endpoints
โ โ โโโ main.py # Main API application
โ โ โโโ models.py # Pydantic models
โ โโโ ingestion/ # Repository loading & processing
โ โ โโโ github_loader.py
โ โ โโโ document_loader.py
โ โโโ parsing/ # Code parsing & chunking
โ โ โโโ chunker.py
โ โ โโโ language_detector.py
โ โโโ retrieval/ # Vector search & embeddings
โ โ โโโ embeddings.py
โ โ โโโ vector_store.py
โ โ โโโ indexer.py
โ โ โโโ search.py
โ โโโ llm/ # LLM integration
โ โโโ llm_client.py
โ โโโ rag_pipeline.py
โ โโโ query_constructor.py
โโโ frontend/ # Streamlit UI
โ โโโ app.py
โโโ tests/ # Unit & integration tests
โ โโโ test_*.py
โ โโโ conftest.py
โโโ data/ # Data storage
โ โโโ vector_store/ # FAISS indexes
โโโ config/ # Configuration
โ โโโ settings.py
โโโ scripts/ # Utility scripts
โ โโโ run_api.py
โโโ .env.example # Environment template
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
โโโโโโโโโโโโโโโ
โ Frontend โ (Streamlit)
โ localhost โ
โ :8501 โ
โโโโโโโโฌโโโโโโโ
โ HTTP Requests
โผ
โโโโโโโโโโโโโโโ
โ FastAPI โ (REST API)
โ Server โ
โ localhost โ
โ :8000 โ
โโโโโโโโฌโโโโโโโ
โ
โโโโบ ๐ Query Pipeline
โ โโโบ Vector Search (FAISS)
โ โโโบ Context Retrieval
โ โโโบ LLM Generation (Gemini)
โ
โโโโบ ๐ฅ Ingestion Pipeline
โ โโโบ Code Loading
โ โโโบ Parsing & Chunking
โ โโโบ Vector Indexing
โ
โโโโบ ๐พ Data Layer
โโโบ FAISS Vector Store
- User Query โ Natural language question
- Query Enhancement โ Expand and optimize query
- Vector Search โ Find relevant code chunks (FAISS)
- Context Building โ Assemble relevant code snippets
- LLM Generation โ Gemini generates contextual answer
- Response โ AI-powered explanation with sources
curl -X POST http://localhost:8000/ingest \
-H "Content-Type: application/json" \
-d '{
"repo_url": "https://github.com/username/repo",
"branch": "main"
}'curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"query": "How does authentication work?",
"language": "python"
}'curl -X POST http://localhost:8000/explain \
-H "Content-Type: application/json" \
-d '{
"code": "def fibonacci(n): return n if n < 2 else fibonacci(n-1) + fibonacci(n-2)",
"language": "python"
}'# Run all tests
pytest
# Run with coverage
pytest --cov=backend --cov-report=html
# Run specific test file
pytest tests/test_vector_store.py
# View coverage report
open htmlcov/index.htmlCurrent Test Results:
- โ 21/21 tests passing
- ๐ 45% code coverage
- โก Fast test execution
Key configuration options in config/settings.py:
# Vector Store
CHUNK_SIZE = 512 # Code chunk size
CHUNK_OVERLAP = 50 # Overlap between chunks
VECTOR_DIMENSION = 384 # Embedding dimension
# LLM
GEMINI_MODEL = "gemini-2.5-flash"
MAX_TOKENS = 2048 # Max response tokens
TEMPERATURE = 0.3 # Response creativity
# Retrieval
TOP_K = 20 # Initial retrieval count
TOP_N = 5 # Final results to use| Metric | Value |
|---|---|
| Indexed Vectors | 4,364 |
| Query Time | ~11ms avg |
| Index Load Time | <2s |
| Embedding Dimension | 384 |
| Test Coverage | 45% |
| Tests Passing | 21/21 โ |
- Vector-based code search
- Natural language queries
- AI-powered explanations
- Modern web interface
- Real-time indexing
- Multi-repository support
- Code generation capabilities
- Team collaboration features
- GitHub integration
- VSCode extension
- Architecture visualization
- Code quality analysis
- Automated documentation
- CI/CD integration
- Enterprise features
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Gemini - AI language model
- FAISS - Vector similarity search
- FastAPI - Modern Python web framework
- Streamlit - Interactive UI framework
- Tree-sitter - Code parsing library
Project Link: https://github.com/Lohith625/codebase-rag
โญ Star this repo if you find it useful!
Made with โค๏ธ and ๐ค by [Lohith m]