heißdocs - A Document Query Application 🔍📄
Add a searchable layer on top of your PDFs!
Fully open-source and ready to be deployed. You store, own, and control the data.
This is a project in progress, so please expect things to break as it moves forward. But the vision of this project is to allow the user to NOT be locked into an ecosystem, so your data is governed and stored by you - therefore even if the app breaks, your data should be supported and can be accessed using tools already at your disposal.
It is to allow a user or an organization to keep track of their PDF files. The complicated thing about PDFs is that they aren't searchable by content. Simply upload a scanned or normal PDF and start searching for content in it with the undisputed power of Elasticsearch (or a NoSQL database)!
heißdocs creates a search layer for your PDFs, down to the exact page (Working on pointing to the exact word!),
- Set up according to the instructions under
Setup
- Upload a file on the Dashboard
- Start searching!
- ☁️ Multi-cloud support (AWS, GCP, Azure)
- 💬 Semantic search (Langchain + OpenAI)
- 💿 Multiple Storage Options
- 🔍 Powerful Search + Versatile Storage
- 📄 View source documents
- 🔒 Full ownership of data
- 🆓 Completely open-source
- 💻 Self-hosted
- ... more things to come + feel free to add in requests!
Please set up the required services before starting the application. You can follow the documentation to configure all services.
- Auth0 - required even before startup:
- For Auth0 you will need to get the required values from the Auth0 portal and paste them accordingly in the
.env
files infrontend
andapp
. This needs to be configured even before building the application.
- For Auth0 you will need to get the required values from the Auth0 portal and paste them accordingly in the
Start by creating a .env
file in the root directory and fill in the values according to the .env.example
file.
Before startup, only the Auth0 values need to be set up. Please follow the documentation for the full guide.
cp .env.example .env
The values in the root .env
file can remain unchanged unless you are planning on hosting each of the services individually.
Similarly, create a .env
file inside the app
, frontend
, and engine
folders and fill them in following the instructions in the respective .env.example
files.
cp frontend/.env.example frontend/.env
cp app/.env.example app/.env
cp engine/.env.example engine/.env
All the keys except Auth0 keys, can be left untouched. Everything else is settable in settings.
Ensure that the credentials that you pasted in the .env
files have the necessary authorizations for operations such as GET
, PUT
, LIST
... etc.
Once your .env
files are ready, navigate to the root directory and run:
docker compose up --build
Then go to localhost:8080
and log in.
[Optional]
In case you want hot-reload on your frontend
, you can choose to run the services separately
Run the backend
services:
docker compose -f docker-compose.yaml up --build
If you want elasticsearch locally running as well, you can include the docker-compose.elasticsearch.override.yaml
file as well in the docker compose
command.
docker compose -f docker-compose.yaml -f docker-compose.elasticsearch.override.yaml up --build
Run the frontend
:
cd frontend
npm install
npm run dev -- --port 8080
cd app
alembic upgrade head
[Optional] If you have your own hosted PostgresSQL database, please make sure to update the sqlalchemy.url
in the alembic.ini
file.
Before using the application, navigate to the Settings
page by clicking on the left-side dashboard button, and configure the settings.
You are all set!
Here's a quick overview of the project
In progress for the community - by Krishnasis 👨🏽💻
Powered by FastAPI 💗