🔹 Install pyenv-win
Install via PowerShell:
Invoke-WebRequest -UseBasicParsing -Uri https://pyenv-win.github.io/pyenv-win/install.ps1 | Invoke-Expression(or follow pyenv-win docs )
Restart PowerShell, then check:
pyenv --version
🔹 Install and set Python 3.11.9(or lower) for your project
pyenv install 3.11.9
cd <your destination folder>
pyenv local 3.11.9This creates a .python-version file (you can commit it to your repo).
Create & activate venv
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txtA comprehensive Python pipeline that parses code, detects issues, generates reports, stores results in vector databases, and enables dependency-aware semantic search.
- Multi-language Support: Python, JavaScript, Java parsing with Tree-sitter
- Comprehensive Issue Detection: Security, complexity, documentation, and duplication analysis
- Vector Database Storage: ChromaDB for semantic search of code and issues
- Dependency Graph Analysis: NetworkX-based dependency visualization With Issues
- Interactive Visualizations: HTML-based interactive graphs and reports(DashBoard)
- Semantic Search: Natural language queries with context-aware responses
- GitHub Analyser: Analyze a Repository Directly from Web, by pasting the {Owner_name}/repository
- QA Bot:Ask Questions about the codeBase with smart context switch to prevent unnecessary Retrievals
- Debugger Agent- an AI agent based on Planner Design for comprehensive bebugging and modification in the codeBase-Use the keyword 'think' in the Chat
- Interactive LightWeight UI- Built using Jinja2 and XHTML and fastAPI to interact with the Bot in Web
- Comprehensive Report of Issues- Prepares a markdown Report with the Issues, explanation and Fix
- CLI design- Interactive CLI design build using typher and Click
- Install dependencies:
pip install -r requirements.txt- Install the package:
pip install -e .- Get Your Gemini API key and and paste it in sample_env file and change its name to .env
To Analyze a folder
python -m src.cli analyze <Repo_location>To Chat with the Bot
python -m src.cli chatTo analyze a Repository on Github
python -m src.cli github <Repo Owner/Repo Name>To run the FastAPI and to interact using UI
uvicorn main:app --reloadThe analysis generates several output files(Go to Analysis_output):
chunks.json: Parsed function-level code chunks with metadataissues.json: Detected issues with detailsreport.json: Enriched report with full contextgraph.html: Interactive dependency graph visualizationsummary_report.html: Summary report with statistics(DashBoard)chroma_db/: Vector database storage for semantic searchReport.md/: Final Generated Report With Issues and Fix and Location e
Python
eval/execusage- SQL injection risks
- Weak cryptographic hashes (MD5, SHA1)
- Hardcoded secrets(api_key, password, token, etc.)
JavaScript
- Use of eval()
- Use of new Function()
- Dangerous DOM writes (.innerHTML =, document.write())
Java
- Use of Runtime.getRuntime().exec()
- Weak cryptography (MessageDigest.getInstance("MD5"/"SHA1"))
- SQL injection via JDBC string concatenation (Statement.execute("..."+var))
- Function length (>200 lines)
- Cyclomatic complexity (>10)
- Excessive nesting depth (>3)
- Missing docstrings for public functions
- Incomplete documentation Java
- Function with no Javadoc (/** ... */). JavaScript
- Function with no JSDoc (/** ... */).
- Duplicate functions across files
- Code similarity detection
- High: Exploitable security risks, extreme complexity
- Medium: Performance/maintainability issues
- Low: Documentation/style issues
The analyzer automatically excludes common non-source files and directories to focus on actual code:
File Patterns:
- Build artifacts:
*.pyc,*.pyo,*.so,*.dll,*.exe,*.jar - IDE files:
*.swp,*.swo,*~,.DS_Store - Documentation:
*.md,*.txt,*.rst,*.pdf - Config files:
*.ini,*.cfg,*.json,*.yaml - Log files:
*.log,*.out,*.err - Temporary files:
*.tmp,*.bak,*.orig
Directories:
- Build dirs:
__pycache__ - IDE dirs:
.vscode,.idea,.git - Environment:
venv,env,.env
You can customize exclusions in several ways:
- Programmatic configuration:
from src.pipeline.analyzer import CodeAnalyzer
analyzer = CodeAnalyzer(
exclude_patterns=['*.pyc', '*.log', 'test_*'],
exclude_dirs=['build', 'dist', 'tests']
)- Configuration file:
See
exclusion_config_example.pyfor detailed examples.
frontend #Interactive UI to interact from a WebUI
Main.py #Contains the FastAPI to run the surver
src/
├── parsers/ # Tree-sitter based code parsing
├── analyzers/ # Local and global issue analysis
├── vector_db/ # ChromaDB integration
├── visualization/ # Graph and report generation With issues annotated
├── qa_agent/ # Semantic search and Q&A with Smart Debugger Agent Support
├── pipeline/ # Main analysis orchestrator
└── github_analyser # To analyise a repository directly from the github
- tree-sitter: Multi-language code parsing
- chromadb: Vector database for semantic search
- sentence-transformers: Text embeddings (all-MiniLM-L6-v4)
- networkx: Dependency graph analysis
- pyvis: Interactive graph visualization
- click and Typher: CLI interface
- FastAPI: For interaction on WebUI
- JINJA2, XHTML, TailWind CSS: For frontend Design
- LangChain: For Agentic AI design
- Google genai: For LLM support - Gemini-2.5-flash
You can extend the analysis by:
- Adding new issue detectors in
src/analyzers/local_analyzer.py - Implementing new languages in
src/parsers/tree_sitter_parser.py - Creating custom visualizations in
src/visualization/ - Adding new query types in
src/qa_agent/semantic_search.py
MIT License