diff --git a/README.md b/README.md index 75cc838b..effab975 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,10 @@ -# NilAI +# nilAI -Copy the `.env.sample` to `.env` to and replace the value of the `HUGGINGFACE_API_TOKEN` with the appropriate value. It is required to download Llama3.2 1B. +Copy the `.env.sample` to `.env` and replace the value of the `HUGGINGFACE_API_TOKEN` with the appropriate value. The `HUGGINGFACE_API_TOKEN` is used to determine whether a user has permission to access certain models. For example, for Llama models, you usually need to have requested access to the model on its [Hugging Face page](https://huggingface.co/meta-llama/Llama-3.2-1B). + +There are two ways to deploy **nilAI**. The recommended way is to use docker-compose as it is the easiest and most straightforward. + +## Docker For development environments: ```shell @@ -12,6 +16,36 @@ For production environments: docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d --build ``` +## Manual Deployment + +**nilAI** consists of the following components: + - **API Frontend**: Receives user requests and handles them appropriately. For model requests, it forwards them to the appropriate backend model. + - **Two Databases**: + - **SQLite**: The main registry of users on the platform. This will be changed as we move to more production-ready environments. It tracks which users are allowed on the platform, their API keys, and their usage. + - **etcd3**: A distributed key-value database used in Kubernetes. It creates key-value pairs with a lifetime. When a key-value pair's lifetime expires, it is automatically removed. Models register their address information on the etcd3 database with a lifetime and keep this lifetime alive. If a model ever disconnects due to an error, the database removes the entry, and the API Frontend no longer advertises that model. + - **Models**: There may be zero or more model deployments. Model deployments contain a basic API that responds in the same format to the `/v1/chat/completions` endpoint. The `Model` class defines how models connect to the database and manage their lifecycle. + +To deploy the components, first create the `etcd3` instance. The easiest way is to expose it with Docker: + +```shell +# This command runs in the background. If it fails, you may already be running etcd-server on ports 2379 and 2380. +docker run -d --name etcd-server -p 2379:2379 -p 2380:2380 -e ALLOW_NONE_AUTHENTICATION=yes bitnami/etcd:latest +``` + +Run the **nilAI** API server: +```shell +# Shell 1 +## For development environment (auto reloads on file changes): +uv run fastapi dev nilai-api/src/nilai_api/__main__.py --port 8080 +## For production environment: +uv run fastapi run nilai-api/src/nilai_api/__main__.py --port 8080 +``` + +Run the **nilAI** Llama 3.2 1B model. For different models, adapt the command below: +```shell +# Shell 2 +## For development environment (auto reloads on file changes): +uv run fastapi dev nilai-models/src/nilai_models/models/llama_1b_cpu/__init__.py +## For production environment: +uv run fastapi run nilai-models/src/nilai_models/models/llama_1b_cpu/__init__.py ``` -uv run gunicorn -c gunicorn.conf.py nilai.__main__:app -``` \ No newline at end of file