Skip to content

Commit

Permalink
Merge pull request #47 from Gurubase/develop
Browse files Browse the repository at this point in the history
First Release
  • Loading branch information
fatihbaltaci authored Jan 21, 2025
2 parents ef6b510 + 45af1d2 commit aeacb0d
Show file tree
Hide file tree
Showing 23 changed files with 497 additions and 131 deletions.
42 changes: 42 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.0] - 2025-01-21

### Added
- Modern Next.js 14 frontend with TailwindCSS
- Django REST framework backend
- RAG system with advanced LLM techniques
- Multiple data source support:
- Website scraping with Firecrawl
- YouTube video transcription
- PDF document processing
- Vector similarity search with Milvus
- Message queue system with RabbitMQ for Celery
- Caching layer with Redis
- PostgreSQL database for data persistence
- Docker Compose based deployment
- Self-hosted installation script
- Binge feature for personalized learning paths
- Context evaluation system to minimize hallucination
- Comprehensive documentation:
- Installation guide
- Architecture documentation
- Development guidelines
- Website widget for embedding Q&A functionality
- Telemetry system with opt-out option

### Infrastructure
- Microservices architecture with Docker Compose
- Nginx for static file serving and reverse proxy
- Celery for asynchronous task processing
- Milvus for vector similarity search
- PostgreSQL for primary data storage
- Redis for caching and rate limiting
- RabbitMQ for message queue

[0.1.0]: https://github.com/Gurubase/gurubase/releases/tag/v0.1.0
88 changes: 88 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Contributing to Gurubase

We love your input! We want to make contributing to Gurubase as easy and transparent as possible, whether it's:

- Reporting a bug
- Discussing the current state of the code
- Submitting a fix
- Proposing new features
- Becoming a maintainer

## Development Process

We use GitHub to host code, to track issues and feature requests, as well as accept pull requests.

1. Fork the repo and create your branch from `master`.
2. If you've added code that should be tested, add tests.
3. If you've changed APIs, update the documentation.
4. Ensure the test suite passes.
5. Make sure your code lints.
6. Issue that pull request!

## Development Setup

We use VSCode devcontainers for development. This ensures a consistent development environment for all contributors.

### Frontend (Next.js)

```bash
cd src/gurubase-frontend

# Install dependencies
yarn install

# Run in development mode
yarn dev-selfhosted
```

### Backend (Django)

The backend development environment is configured using VSCode devcontainers. To get started:

1. Install VSCode and the "Remote - Containers" extension
2. Open the project in VSCode
3. When prompted, click "Reopen in Container" or run the "Remote-Containers: Reopen in Container" command
4. The container will be built and configured automatically

Once inside the container:

```bash
cd src/gurubase-backend/backend

bash migrate_runserver.sh
```

## Pull Request Process

1. Update the README.md with details of changes to the interface, if applicable.
2. Update the CHANGELOG.md with a note describing your changes.
3. The PR will be merged once you have the sign-off of at least one other developer.

## Code Style

### Frontend

- Use ESLint and Prettier configurations provided in the project
- Follow React best practices and hooks guidelines
- Use functional components
- Implement proper TypeScript types
- Follow the existing component structure

### Backend

- Follow PEP 8 style guide
- Use Django's coding style
- Write docstrings for all functions and classes
- Keep functions small and focused
- Use type hints where possible

## Commit Messages

- Use the present tense ("Add feature" not "Added feature")
- Use the imperative mood ("Move cursor to..." not "Moves cursor to...")
- Limit the first line to 72 characters or less
- Reference issues and pull requests liberally after the first line

## License

By contributing, you agree that your contributions will be licensed under its Apache License 2.0.
10 changes: 7 additions & 3 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ bash gurubase.sh
The installer will:
1. Create a `.gurubase` directory in your home folder
2. Prompt for required API keys
3. Download and start all necessary services
3. Download and start all necessary [services](#services)
4. Open the web interface at http://localhost:8029

### Upgrade
Expand All @@ -43,7 +43,7 @@ bash gurubase.sh rm
> [!CAUTION]
> To remove everything including all data (volumes), you can run the following command:
> ```bash
> cd ~/.gurubase && docker compose down --volumes
> rm -rf ~/.gurubase
> ```
### System Requirements
Expand All @@ -62,7 +62,7 @@ bash gurubase.sh rm
- 10GB available disk space (SSD preferred for better performance)
- **Network**
- Ports 8028 and 8029 must be available
- Ports `8028` and `8029` must be available
> [!NOTE]
> Only Linux and MacOS are supported at the moment. Native Windows is not supported, but you can use WSL2 to run Gurubase on Windows.
Expand Down Expand Up @@ -132,3 +132,7 @@ Here's a detailed comparison between Gurubase Cloud and Self-hosted versions:
| GitHub Codebase Indexing | ✅ Available | ✅ Available |
| Website Widget | ✅ Available | ✅ Available |
| Base LLM | ✅ OpenAI GPT-4o | ✅ OpenAI GPT-4o |

## Additional Information

For frequently asked questions about Gurubase, including system architecture, use cases, data handling, and more, please check the [FAQ section in README.md](README.md#frequently-asked-questions).
154 changes: 137 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,74 @@
<div align="center">
<img src="https://pbs.twimg.com/profile_banners/1828170456110682112/1725545674/1500x500" alt="Gurubase Image" /><br/>
<img src="https://raw.githubusercontent.com/Gurubase/gurubase/refs/heads/develop/imgs/gurubase-light-logo.svg#gh-light-mode-only" alt="Gurubase Light Logo" width="300px" />
<img src="https://raw.githubusercontent.com/Gurubase/gurubase/refs/heads/develop/imgs/gurubase-dark-logo.svg#gh-dark-mode-only" alt="Gurubase Dark Logo" width="300px" /><br/><br />
</div>


<div align="center">
# Gurubase - AI-powered Q&A assistants for any topic

[![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/9CMRSQPqx6)
[![Twitter](https://img.shields.io/badge/Twitter-%231DA1F2.svg?style=for-the-badge&logo=x&logoColor=white)](https://twitter.com/gurubaseio)
[![Mastodon](https://img.shields.io/badge/Mastodon-%236364FF.svg?style=for-the-badge&logo=mastodon&logoColor=white)](https://mastodon.social/@gurubaseio)
[![Bluesky](https://img.shields.io/badge/Bluesky-%230285FF.svg?style=for-the-badge&logo=bluesky&logoColor=white)](https://bsky.app/profile/gurubase.bsky.social)
![Gurubase Intro](imgs/gurubase_intro.png)

</div>

# Gurubase
<div align="center">

[![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?logo=discord&logoColor=white)](https://discord.gg/9CMRSQPqx6)
[![Twitter](https://img.shields.io/badge/Twitter-%231DA1F2.svg?logo=x&logoColor=white)](https://twitter.com/gurubaseio)
[![Mastodon](https://img.shields.io/badge/Mastodon-%236364FF.svg?logo=mastodon&logoColor=white)](https://mastodon.social/@gurubaseio)
[![Bluesky](https://img.shields.io/badge/Bluesky-%230285FF.svg?logo=bluesky&logoColor=white)](https://bsky.app/profile/gurubase.bsky.social)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/Gurubase/gurubase/blob/main/LICENSE)
</div>

- [What is Gurubase](#what-is-gurubase)
- [Features](#features)
- [Quick Install](#quick-install)
- [How to Create a Guru](#how-to-create-a-guru)
- [How to Claim a Guru](#how-to-claim-a-guru)
- [Showcase Your Guru](#showcase-your-guru)
- [How to Update Datasources](#how-to-update-datasources)
- [License](#license)
- [Help](#help)
- [Used By](#used-by)
- [Frequently Asked Questions](#frequently-asked-questions)

## What is Gurubase

[Gurubase](https://gurubase.io) lets you create AI-powered Q&A assistants for any topic or need. Create a new Guru by uploading webpages, PDFs, videos, or GitHub repositories. Start asking questions directly on Gurubase.io, or [embed it on your website](https://github.com/Gurubase/gurubase-widget) to let your users ask questions about your product. It’s already being [used by](#used-by) hundreds of open-source repositories.
[Gurubase](https://gurubase.io) is an open-source RAG system that lets you create AI-powered Q&A assistants ("Gurus") for any topic or need. Create a new Guru by adding:
- 📄 Webpages
- 📑 PDFs
- 🎥 YouTube videos
- 💻 GitHub repositories

Start asking questions directly on Gurubase, or [embed it on your website](https://github.com/Gurubase/gurubase-widget) to let your users ask questions about your product. It's already being [used by](#used-by) hundreds of open-source repositories. You can also install the entire system on your server, check [INSTALL.md](INSTALL.md) for instructions on how to self-host Gurubase.

## Features

- 🤖 **AI-Powered Q&A**: Advanced LLM-based question answering, including instant evaluation mechanism to minimize hallucination as much as possible
- 🔄 **RAG System**: Retrieval Augmented Generation for accurate, context-aware responses
- 📚 **Multiple Data Sources**: Add web pages, PDFs, videos, and GitHub repositories as data sources for your Guru.
- 🔌 **Easy Integration**: Embeddable widget for your website. Discord and Slack Bots coming soon
- 🎯 **Custom Gurus**: Create specialized AI assistants for specific topics
- 🔄 **Real-time Updates**: Keep the data sources up to date by reindexing them with one click
-**Binge**: Visualize your learning path while talking with a Guru. You can navigate through it and create a personalized path
- 🛠 **Self-hosted Option**: Full control over your deployment. Install the entire system on your servers

## Quick Install

If you prefer not to use [Gurubase.io](https://gurubase.io), you can install the entire system on your own servers.

```bash
curl -fsSL https://raw.githubusercontent.com/Gurubase/gurubase/refs/heads/develop/gurubase.sh -o gurubase.sh
bash gurubase.sh
```

See [INSTALL.md](INSTALL.md) for detailed installation instructions and prerequisites.

## How to Create a Guru

Currently, only the Gurubase team can create a Guru. Please [open an issue](https://github.com/Gurubase/gurubase/issues/new?template=guru_creation_request.md) on this repository with the title "Guru Creation Request" and include the GitHub repository link in the issue content. We prioritize Guru creation requests from the maintainers of the tools. Please mention whether you are the maintainer of the tool. If you are not the maintainer, it would be helpful to obtain the maintainer's permission before opening a creation request for the tool.
Currently, only the Gurubase team can create a Guru on [Gurubase.io](https://gurubase.io/). Please [open an issue](https://github.com/Gurubase/gurubase/issues/new?template=guru_creation_request.md) on this repository with the title "Guru Creation Request" and include the GitHub repository link in the issue content. We prioritize Guru creation requests from the maintainers of the tools. Please mention whether you are the maintainer of the tool. If you are not the maintainer, it would be helpful to obtain the maintainer's permission before opening a creation request for the tool.

## How to Claim a Guru

Although you can't create a Guru, you can manage it on Gurubase. For example, you can add, remove, or reindex the datasources. To claim a Guru, you must have a Gurubase account and be one of the tool's maintainers. Please [open an issue](https://github.com/Gurubase/gurubase/issues/new?template=guru_claim_request.md) with the title "Guru Claim Request". Include the link to the Guru (e.g., `https://gurubase.io/g/anteon`), your Gurubase username, and a link proving you are one of the maintainers of the tool, such as a PR merged by you.
Although you can't create a Guru on [Gurubase.io](https://gurubase.io/), you can manage it on Gurubase. For example, you can add, remove, or reindex the datasources. To claim a Guru, you must have a Gurubase account and be one of the tool's maintainers. Please [open an issue](https://github.com/Gurubase/gurubase/issues/new?template=guru_claim_request.md) with the title "Guru Claim Request". Include the link to the Guru (e.g., `https://gurubase.io/g/anteon`), your Gurubase username, and a link proving you are one of the maintainers of the tool, such as a PR merged by you.

## Showcase Your Guru

Expand All @@ -52,27 +87,29 @@ Like hundreds of GitHub repositories, add a badge to your README to guide your u
[![Gurubase](https://img.shields.io/badge/Gurubase-Ask%20OpenCost%20Guru-006BFF)](https://gurubase.io/g/opencost)
```

<img src="imgs/badge_sample.png" alt="Gurubase Image" width="500"/><br/>
<img src="imgs/badge_sample.png" alt="Gurubase Badge" width="500"/><br/>

## How to Update Datasources

Datasources can include your tool's documentation webpages, YouTube videos, or PDF files. You can add new ones, remove existing ones, or reindex them. Reindexing ensures your Guru is updated based on changes to the indexed datasources. For example, if you update your tool's documentation, you can reindex those pages so your Guru generates answers based on the latest data.

Once you claim your Guru, you will see your Gurus in the "My Gurus" section.

<img src="imgs/image.png" alt="Gurubase Image" width="300"/><br/>
<img src="imgs/image.png" alt="Gurubase My Gurus" width="300"/><br/>

Click the Guru you want to update. On the edit page, click "Reindex" for the datasource you want to reindex.

<img src="imgs/image-1.png" alt="Gurubase Image" width="720"/><br/>
<img src="imgs/image-1.png" alt="Gurubase Reindex" width="720"/><br/>

You can also see the "Last Index Date" on the URL pages.

<img src="imgs/image-2.png" alt="Gurubase Image" width="720"/><br/>
<img src="imgs/image-2.png" alt="Gurubase Last Index Date" width="720"/><br/>

## License

All the content generated by Gurubase aligns with the license of the datasources used to generate answers. More details can be found on the [Terms of Usage](https://gurubase.io/terms-of-use) page, Section 2.
Licensed under the [Apache 2.0 License](LICENSE).

All the content generated by [gurubase.io](https://gurubase.io) aligns with the license of the datasources used to generate answers. More details can be found on the [Terms of Usage](https://gurubase.io/terms-of-use) page, Section 2.

## Help

Expand Down Expand Up @@ -297,4 +334,87 @@ Gurubase currently hosts **hundreds** of Gurus, and it grows every day. Here are
<b><i>100+ more</i></b>
</td>
</tr>
</table>
</table>

## Frequently Asked Questions

### What is Gurubase?
Gurubase is an open-source RAG system that creates AI-powered Q&A assistants ("Gurus"). It processes various data sources like web pages, videos, PDFs, and GitHub code repositories to provide context-aware answers.

### How does Gurubase work?
Gurubase uses a modern RAG architecture:
1. **Indexing**: Processes and chunks data sources
2. **Embedding**: Converts text into vector representations
3. **Storage**: Stores vectors in Milvus for efficient similarity search
4. **Retrieval**: Finds relevant context when questions are asked
5. **Generation**: Uses LLMs to generate accurate answers based on retrieved context
6. **Evaluation**: Evaluates the contexts to prevent hallucinations

Check the [ARCHITECTURE.md](src/gurubase-backend/ARCHITECTURE.md) file for more details.

### What types of data sources can I use?
Gurubase supports multiple data source types:
- 📄 Web Pages
- 📑 PDF Documents
- 🎥 YouTube Videos
- 💻 GitHub repositories for codebase indexing
- More formats coming soon! Open an issue if you want a new data source type.

### What's the system architecture?
Gurubase follows a microservices architecture, deployed as Docker compose.
- Frontend: Next.js 14 with TailwindCSS
- Backend: Django REST framework
- Vector Store: Milvus
- Message Queue: RabbitMQ
- Cache: Redis
- Database: PostgreSQL
See [ARCHITECTURE.md](src/gurubase-backend/ARCHITECTURE.md) for details.

### What are the system requirements?
Minimum requirements:
- CPU: 4 cores
- RAM: 8GB
- Storage: 10GB SSD
- OS: Linux or macOS (Windows via WSL2)
See [INSTALL.md](INSTALL.md) for detailed requirements.


### What are the use cases for using my Gurus created on Gurubase?

1. You can use it on [Gurubase.io](https://gurubase.io/) (or on Gurubase Self-hosted if you’ve installed it on your servers).
2. You can embed an [Ask AI widget](https://github.com/gurubase/gurubase-widget) into your website.
3. You can add a [Gurubase badge](#2-badge) to your GitHub repository README.
4. We will release an API soon.

### Are there Discord/Slack integrations?
Discord and Slack integrations are currently in development. Join our [Discord](https://discord.gg/9CMRSQPqx6) for updates.


### What is Binge?
Binge lets you:
- Create personalized learning paths on any Guru.
- Ask follow-up questions to dive deeper into the content.
- Visualize your learning path on the Binge Map and navigate it easily and efficiently.
- Save your progress to pick up where you left off.

### How often is data reindexed?
- Manual reindexing available anytime. Check [How to Update Datasources](#how-to-update-datasources) section to learn more
- Periodic reindexing will be available soon

### Is there an API available?
A public API is in development. Features will include:
- Question answering
- Data source management
- Analytics and usage stats
Join our [Discord](https://discord.gg/9CMRSQPqx6) for API release updates.

### What's the license for self-hosted Gurubase?
- Code is licensed under [Apache 2.0](LICENSE)

### How is data handled and secured?
- All data is stored locally in self-hosted deployments including the API keys
- No data is sent to external servers except LLM API calls
- Optional telemetry can be disabled

### What is Gurubase.io?
[Gurubase.io](https://gurubase.io/) is a hosted version of Gurubase. It's a great way to get started with Gurubase without the hassle of self-hosting.
Loading

0 comments on commit aeacb0d

Please sign in to comment.