Airweave is an open-core tool that makes any app searchable for your agent by unifying your apps, APIs, and databases and your users' data into your vector database of choice with minimal configuration.
- Overview
- Key Features
- Architecture
- Technology Stack
- Quick Start
- Configuration
- Usage
- Contributing
- Roadmap
- License
Airweave simplifies the process of making your data searchable. Whether you have structured or unstructured data, Airweave helps you break it into processable chunks, store the data in a vector database, and retrieve it via your own agent or any search mechanism.
- Over 120 integrations: Airweave is your one-stop shop for building any application that requires semantic search.
- Simplicity: Minimal configuration needed to sync data from diverse sources (APIs, databases, and more).
- Extensibility: Easily add new integrations via
sources
,destinations
andembedders
. - Open-Core: Core features are open source, ensuring transparency. Future commercial offerings will bring additional, advanced capabilities.
- Async-First: Built to handle large-scale data synchronization asynchronously (upcoming: managed Redis workers for production scale)
- No code reqired, but extensible: Users that prefer not to touch any code can make their app searchable in a few clicks
- White-Labeled Multi-Tenant Support: Ideal for SaaS builders, Airweave provides a streamlined OAuth2-based platform for syncing data across multiple tenants while maintaining privacy and security.
- Chunk Generators: Each source (like a database, API, or file system) defines a
@chunk_generator
that yields data in a consistent format. You can also define your own. - Automated Sync: Schedule data synchronization or run on-demand sync jobs.
- Versioning & Hashing: Airweave detects changes in your data via hashing, updating only the modified chunks in the vector store.
- Multi-Source Support: Plug in multiple data sources and unify them into a single queryable layer.
- Scalable: Deploy locally via Docker Compose for development (upcoming: deploy with Kubernetes for production scale)
┌──────────────┐
│ Your App │
└─────┬────────┘
│
▼
┌───────────────────┐ (Search with Airweave)
│ Airweave |<--------------------------+
└───────────────────┘ |
/ | \ |
/ | \ |
▼ ▼ ▼ |
┌────────┐ ┌────────┐ ┌────────┐ |
│ Source │ │ Source │ │ Source │ ... (Any number of integrations)
└────────┘ └────────┘ └────────┘ |
\ | / |
\ | / |
▼ ▼ ▼ |
┌────────────────────┐ |
│ Chunk Generators │ |
└────────────────────┘ |
│ |
▼ |
┌───────────────────┐ |
│ Synchronizer(s) │ |
└───────────────────┘ |
│ |
▼ |
┌───────────────────┐ |
│ Vector Database(s)│---------------------------+
└───────────────────┘
-
Frontend (React)
- Provides a dashboard for you to configure your sources, destinations, and sync schedules.
- Visualizes sync jobs, logs, and chunk details.
-
Backend (FastAPI)
- Houses the RESTful endpoints.
- Manages data models such as
source
,destination
,org
,user
,chunk types
, and more. - Allows for automatic synchronization tasks and chunk generation logic.
-
Chunk Generators
- Integration-specific modules or decorators (e.g.,
@chunk_generator
) that define how to fetch and transform source data. - Produces "chunks" of data (text, metadata, etc.) that are hashed to detect changes.
- Integration-specific modules or decorators (e.g.,
-
Synchronizers
- Compare generated chunk hashes.,
- Insert new chunks into the vector DB or mark them for deletion if they're no longer relevant.
- Upcoming: run asynchronously, in parallel, to handle large data sets efficiently.
-
Data Store
- Postgres for storing metadata about sources, jobs, users, and schedules.
- A vector database (currently in the open-core version, it can be anything your Docker Compose config sets up) to store embeddings of your chunks.
- Upcoming: Redis deployment for caching and queue-based workflows (e.g., background worker tasks).
- Frontend: React (JavaScript/TypeScript)
- Backend: FastAPI (Python)
- Infrastructure:
- Local / Dev: Docker Compose
- Production: (upcoming) Kubernetes
- Databases:
- PostgreSQL for relational data
- Vector database (your choice, e.g. Chroma, Milvus, Pinecone, Qdrant, Weaviate, etc.) + (upcoming batteries-included vector DB)
- Asynchronous Tasks: ARQ for background workers
Below is a simple guide to get Airweave up and running locally. For more detailed instructions, refer to the docs.
- Docker (v20+)
- Docker Compose (v2.0+)
-
Clone the Repository
git clone https://github.com/yourusername/airweave.git cd airweave
-
Set Up Environment Variables
Copy .env.example to .env and update any necessary variables, such as Postgres connection strings or credentials. -
Build and Run
docker-compose up --build
That's it!
You now have Airweave running locally. You can log in to the dashboard, add new sources, and configure your sync schedules.
Below are some basic commands and API endpoints that you may find useful.
- Run Tests: (If you have a test suite set up for your own code)
docker-compose exec backend pytest
- Start in Dev Mode:
docker-compose up --build
- Swagger Documentation:
http://localhost:8000/docs
- Get All Sources:
GET /api/sources
- Create a Source:
POST /api/sources
{
"name": "My Data Source",
"type": "postgres",
"connection_info": {...}
}
- Trigger a Sync Job:
POST /api/sync_jobs
{
"source_id": 123,
"schedule_id": null
}
- Get Sync Job Status:
GET /api/sync_jobs/{job_id}
{
"status": "running",
"chunks_processed": 100,
"chunks_total": 200
}
- Access the React UI at
http://localhost:3000
. - Navigate to Sources to add new integrations.
- Set up or view your sync schedules under Schedules.
- Monitor sync jobs in Jobs.
Airweave provides flexibility through environment variables, allowing you to customize key aspects of your deployment. Below are the primary configuration options:
You can configure Airweave to use your own PostgreSQL database to store sources, schedules, and metadata. Update the following variables in your .env
file:
POSTGRES_USER=<your-database-username>
POSTGRES_PASSWORD=<your-database-password>
POSTGRES_DB=<your-database-name>
POSTGRES_HOST=<your-database-host>
POSTGRES_PORT=<your-database-port>\
This configuration will create a number of tables within the airweave schema in your specified database. More on this later.
You can specify the logging destination. Currently we support Datadog and Sentry.
# Datadog
LOG_DESTINATION=datadog
DATADOG_API_KEY=<your-datadog-api-key>
DATADOG_HOST=<your-datadog-host>
DATADOG_PORT=<your-datadog-port>
# Sentry
LOG_DESTINATION=sentry
SENTRY_DSN=<your-sentry-dsn>
SENTRY_ENVIRONMENT=development
SENTRY_RELEASE=<your-release-name>
You can specify the SMTP settings in your .env
file. This is used for sending email notifications.
SMTP_HOST=<your-smtp-host>
SMTP_PORT=<your-smtp-port>
SMTP_USER=<your-smtp-user>
SMTP_PASSWORD=<your-smtp-password>
SMTP_FROM_EMAIL=<your-from-email>
We welcome all contributions! Whether you're fixing a bug, improving documentation, or adding a new feature:
- Fork this repository
- Create a feature branch:
git checkout -b feature/new-chunk-generator
- Commit changes:
git commit -m "Add new chunk generator for XYZ"
- Push to your fork:
git push origin feature/new-chunk-generator
- Create a Pull Request: Submit your PR against this repo's main branch.
Please follow the existing code style and conventions. See CONTRIBUTING.md
for more details.
- Additional Integrations: Expand chunk generators for popular SaaS APIs and databases.
- Redis & Worker Queues: Improved background job processing and caching for large or frequent syncs.
- Webhooks: Trigger syncs on external events (e.g. new data in a database)
- Kubernetes Support: Offer easy Helm charts for production-scale deployments.
- Commercial Offerings: Enterprise features, extended metrics, and priority support.
Airweave is released under an open-core model. The community edition is licensed under MIT License. Additional modules (for enterprise or advanced features) may be licensed separately.
- Discord: Join our Discord channel here to get help or discuss features.
- GitHub Issues: Report bugs or request new features in GitHub Issues.
- Twitter: Follow @airweave_dev for updates.
That's it! We're looking forward to seeing what you build. If you have any questions, please don't hesitate to open an issue or reach out on Discord.
- White Label Multi-Tenancy - Learn how Airweave's white label feature works with multi-tenant integrations