Extralit is a UI interface and platform for LLM-based document data extraction that integrates human and model feedback loops for continuous LLM refinement and data extraction oversight.
With a Python SDK and flexible UI, you can create human and model-in-the-loop workflows for:
- Data extraction validation
- Supervised fine-tuning
- Preference tuning (RLHF, DPO, RLAIF, and more)
- Small, specialized NLP models
- Scalable evaluation.
These steps are required to run and develop Argilla locally.
- Install Docker Desktop
- Install kind
- Install ctlptl
- Install Tilt
- Create a
kind
cluster
ctlptl create registry ctlptl-registry --port=5005
ctlptl create cluster kind --registry=ctlptl-registry
- Apply config to mount local directory
ctlptl apply -f k8s/kind/kind-config.yaml
kubectl taint node kind-control-plane node-role.kubernetes.io/control-plane:NoSchedule-
- Run Tilt
Select the K8s cluster
kubectl config set-cluster <cluster_name>
Setting the ENV
variable to dev
enables hot-reloading of Docker containers for π rapid deployment:
kubectl create ns <namespace>
ENV=dev tilt up --namespace=<namespace>
ENV=dev DOCKER_REPO=<remote docker repository> tilt up --namespace <namespace> --context <K8s cluster context>
Editting the database schema files at src/argilla/server/models/*.py
require running these commands to apply revisions to the database.
- Create revision
cd src/argilla
alembic revision -m <message>
If you happen to run into errors due to the revisions from upstream argilla-io/argilla repo, set the down-revision tag to their latest in the revision "7552df94427a"
at src/argilla/server/alembic/versions
- Apply the revision
# Be sure to set environment variables ARGILLA_ELASTICSEARCH and ARGILLA_DATABASE_URL
python -m argilla server database migrate
- Update frontend site to the API backend
bash scripts/build_frontend.sh
python setup.py bdist_wheel
Argilla is built on 5 core components:
- Python SDK: A Python SDK which is installable with
pip install argilla
. To interact with the Argilla Server and the Argilla UI. It provides an API to manage the data, configuration and annotation workflows. - FastAPI Server: The core of Argilla is a Python FastAPI server that manages the data, by pre-processing it and storing it in the vector database. Also, it stores application information in the relational database. It provides a REST API to interact with the data from the Python SDK and the Argilla UI. It also provides a web interface to visualize the data.
- Relational Database: A relational database to store the metadata of the records and the annotations. SQLite is used as the default built-in option and is deployed separately with the Argilla Server but a separate PostgreSQL can be used too.
- Vector Database: A vector database to store the records data and perform scalable vector similarity searches and basic document searches. We currently support ElasticSearch and AWS OpenSearch and they can be deployed as separate Docker images.
- Vue.js UI: A web application to visualize and annotate your data, users and teams. It is built with Vue.js and is directly deployed alongside the Argilla Server within our Argilla Docker image.
argilla-server
is using argilla
repository as submodule to build frontend statics so when cloning use the following command:
git clone --recurse-submodules git@github.com:argilla-io/argilla-server.git
If you already cloned the repository without using --recurse-submodules
you can init and update the submodules with:
git submodule update --remote --recursive --init
Important
By default argilla
submodule is using develop
branch so the previous command will get the latest commit from that branch.
When doing a release we should change argilla
submodule to use an specific tag. In the following example we are setting tag v1.22.0
:
cd argilla
git fetch --tags
git checkout v1.22.0
Note
You should see some changes on the argilla-server
root folder where the subproject commit is now changed to the one from the tag version. Feel free to commit these changes.
By default all commands executed with pdm run
will get environment variables from .env.dev
except command pdm test
that will overwrite some of them using values coming from .env.test
file.
These environment variables can be overrided if necessary so feel free to defined your own ones locally.
pdm cli
By default a SQLite located at ~/.argilla/argilla.db
will be used. You can create the database and run migrations with the following custom PDM command:
pdm migrate
A SQLite database located at ~/.argilla/argilla-test.db
will be automatically created to run tests. You can run the entire test suite using the following custom PDM command:
pdm test
Before running Argilla development server we need to build the frontend static files. Node version 18 is required for this action:
brew install node@18
After that you can build the frontend static files:
./scripts/build_frontend.sh
After running the previous script you should have a folder at src/argilla_server/static
with all the frontend static files successfully generated.
pdm server