CODEPILOT is a powerful semantic search and document analysis tool designed to streamline the exploration of codebases and textual repositories. By leveraging state-of-the-art machine learning models and FAISS as a vector database, CODEPILOT enables users to perform context-aware searches, gain actionable insights, and answer complex queries directly from documents.
CODEPILOT follow per-query billing model; it generates estimated cost of running query before processing any query, enables user to choose if they are willing to proceed.
CODEPILOT uses embeddings to enable advanced semantic search capabilities. Instead of simple keyword matching, it understands the meaning behind user queries, providing more accurate and relevant results.
CODEPILOT extracts structured insights from textual data, such as entities, keywords, and context, to provide a deeper understanding of the content.
Designed with developers in mind, CODEPILOT seamlessly integrates with Git repositories, indexing code and documentation for easy exploration.
CODEPILOT divides large documents into manageable chunks, generating embeddings for each chunk. This enables scalable indexing and retrieval of data.
CODEPILOT offers an intuitive query interface that supports iterative and conversational searches, helping users refine their queries to get precise answers.
- Extract Keywords and Entities: Automatically identifies keywords and named entities within the text to enhance search accuracy.
- Contextual Analysis: Generates insights to understand user intent and provide meaningful responses.
- Vector Database: Uses FAISS to store and retrieve document embeddings efficiently.
- Similarity Search: Finds relevant chunks based on semantic similarity between query embeddings and document embeddings.
- Dynamic Chunking: Splits documents into token-limited chunks to optimize embedding generation.
- Streaming Support: Reads and processes large files incrementally for better memory management.
- Python 3.8+
- Libraries: PyTorch, FAISS, SentenceTransformers, and FastAPI
- Clone the repository:
git clone https://github.com/yourusername/codepilot.git cd codepilot
- Install dependencies:
pip install -r requirements.txt
- Launch the application:
python main.py
- Index Files: Point CODEPILOT to a directory or Git repository to index files.
- Ask Questions: Use the interactive query interface to explore the indexed content.
- Iterate and Refine: Leverage conversational capabilities to refine queries and improve search outcomes.
python codepilot.py <git_repo_folder> <git_repo_url> <path_to_config.ini>
We welcome contributions! Please open an issue or submit a pull request to help us improve CODEPILOT.
This project is licensed under the Apache 2.0. It is free to use and modify for personal and academic purposes but cannot be used for commercial purposes without explicit permission.
For questions, feedback, or feature requests, reach out via email at [atahusain.b@gmail.com] or open an issue on GitHub.