Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Ollama part of local deployment #1066

Merged
merged 4 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.7.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.7.0"></a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=2e6cc4" alt="license">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
</a>
</p>

Expand Down Expand Up @@ -315,7 +315,7 @@ To launch the service from source:

- [Quickstart](https://ragflow.io/docs/dev/)
- [User guide](https://ragflow.io/docs/dev/category/user-guides)
- [Reference](https://ragflow.io/docs/dev/category/references)
- [References](https://ragflow.io/docs/dev/category/references)
- [FAQ](https://ragflow.io/docs/dev/faq)

## 📜 Roadmap
Expand Down
4 changes: 2 additions & 2 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.7.0-brightgreen"
alt="docker pull infiniflow/ragflow:v0.7.0"></a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=2e6cc4" alt="license">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
</a>
</p>

Expand Down Expand Up @@ -262,7 +262,7 @@ $ bash ./entrypoint.sh

- [Quickstart](https://ragflow.io/docs/dev/)
- [User guide](https://ragflow.io/docs/dev/category/user-guides)
- [Reference](https://ragflow.io/docs/dev/category/references)
- [References](https://ragflow.io/docs/dev/category/references)
- [FAQ](https://ragflow.io/docs/dev/faq)

## 📜 ロードマップ
Expand Down
4 changes: 2 additions & 2 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.7.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.7.0"></a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=2e6cc4" alt="license">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
</a>
</p>

Expand Down Expand Up @@ -282,7 +282,7 @@ $ systemctl start nginx

- [Quickstart](https://ragflow.io/docs/dev/)
- [User guide](https://ragflow.io/docs/dev/category/user-guides)
- [Reference](https://ragflow.io/docs/dev/category/references)
- [References](https://ragflow.io/docs/dev/category/references)
- [FAQ](https://ragflow.io/docs/dev/faq)

## 📜 路线图
Expand Down
91 changes: 67 additions & 24 deletions docs/guides/deploy_local_llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,42 +5,85 @@ slug: /deploy_local_llm

# Deploy a local LLM

RAGFlow supports deploying LLMs locally using Ollama or Xinference.
RAGFlow supports deploying models locally using Ollama or Xinference. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models.

## Ollama
RAGFlow seamlessly integrates with Ollama and Xinference, without the need for further environment configurations. RAGFlow v0.7.0 supports running two types of local models: chat models and embedding models.
JinHai-CN marked this conversation as resolved.
Show resolved Hide resolved

One-click deployment of local LLMs, that is [Ollama](https://github.com/ollama/ollama).
:::tip NOTE
This user guide does not intend to cover much of the installation or configuration details of Ollama or Xinference; its focus is on configurations inside RAGFlow. For the most current information, you may need to check out the official site of Ollama or Xinference.
:::

### Install
## Deploy a local model using Ollama

- [Ollama on Linux](https://github.com/ollama/ollama/blob/main/docs/linux.md)
- [Ollama Windows Preview](https://github.com/ollama/ollama/blob/main/docs/windows.md)
- [Docker](https://hub.docker.com/r/ollama/ollama)
[Ollama](https://github.com/ollama/ollama) enables you to run open-source large language models that you deployed locally. It bundles model weights, configurations, and data into a single package, defined by a Modelfile, and optimizes setup and configurations, including GPU usage.

### Launch Ollama
:::note
- For information about downloading Ollama, see [here](https://github.com/ollama/ollama?tab=readme-ov-file#ollama).
- For information about configuring Ollama server, see [here](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server).
- For a complete list of supported models and variants, see the [Ollama model library](https://ollama.com/library).
:::

To deploy a local model, e.g., **7b-chat-v1.5-q4_0**, using Ollama:
writinwaters marked this conversation as resolved.
Show resolved Hide resolved

1. Ensure that the service URL of Ollama is accessible.
2. Run your local model:

```bash
ollama run qwen:7b-chat-v1.5-q4_0
```
<details>
<summary>If your Ollama is installed through Docker, run the following instead:</summary>

```bash
docker exec -it ollama ollama run qwen:7b-chat-v1.5-q4_0
```
</details>

3. In RAGFlow, click on your logo on the top right of the page **>** **Model Providers** and add Ollama to RAGFlow:

![add llm](https://github.com/infiniflow/ragflow/assets/93570324/10635088-028b-4b3d-add9-5c5a6e626814)

4. In the popup window, complete basic settings for Ollama:

- In this case, **qwen:7b-chat-v1.5-q4_0** is a chat model, so we choose **chat** as the model type.
- Ensure that the model name you enter here *precisely* matches the name of the local model you are running with Ollama.
- Ensure that the base URL you enter is accessible to RAGFlow.
- OPTIONAL: Switch on the toggle under **Does it support Vision?**, if your model includes an image-to-text model.

![ollama settings](https://github.com/infiniflow/ragflow/assets/93570324/0ba3942e-27ba-457c-a26f-8ebe9edf0e52)

:::caution NOTE
- If your Ollama and RAGFlow run on the same machine, use `http://localhost:11434` as base URL.
- If your Ollama and RAGFlow run on the same machine and Ollama is in Docker, use `http://host.docker.internal:11434` as base URL.
- If your Ollama runs on a different machine from RAGFlow, use `http://<IP_OF_OLLAMA_MACHINE>` as base URL.
JinHai-CN marked this conversation as resolved.
Show resolved Hide resolved
:::

:::danger WARNING
If your Ollama runs on a different machine, you may also need to update the system environments in **ollama.service**:

Decide which LLM you want to deploy ([here's a list for supported LLM](https://ollama.com/library)), say, **mistral**:
```bash
$ ollama run mistral
```
Or,
```bash
$ docker exec -it ollama ollama run mistral
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_MODELS=/APP/MODELS/OLLAMA"
```

### Use Ollama in RAGFlow
See [here](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server) for more information.
:::

- Go to 'Settings > Model Providers > Models to be added > Ollama'.

![](https://github.com/infiniflow/ragflow/assets/12318111/a9df198a-226d-4f30-b8d7-829f00256d46)
5. Click on your logo **>** **Model Providers** **>** **System Model Settings** to update your model:

*You should now be able to find **7b-chat-v1.5-q4_0** from the dropdown list under **Chat model**.*

> If your local model is an embedding model, you should find your local model under **Embedding model**.

![system model settings](https://github.com/infiniflow/ragflow/assets/93570324/c627fb16-785b-4b84-a77f-4dec604570ed)

> Base URL: Enter the base URL where the Ollama service is accessible, like, `http://<your-ollama-endpoint-domain>:11434`.
6. In this case, update your chat model in **Chat Configuration**:

- Use Ollama Models.
![chat config](https://github.com/infiniflow/ragflow/assets/93570324/7cec4026-a509-47a3-82ec-5f8e1f059442)

![](https://github.com/infiniflow/ragflow/assets/12318111/60ff384e-5013-41ff-a573-9a543d237fd3)
> If your local model is an embedding model, update it on the configruation page of your knowledge base.

## Xinference
## Deploy a local model using Xinference

Xorbits Inference([Xinference](https://github.com/xorbitsai/inference)) empowers you to unleash the full potential of cutting-edge AI models.

Expand All @@ -55,8 +98,8 @@ $ xinference-local --host 0.0.0.0 --port 9997
```
### Launch Xinference

Decide which LLM you want to deploy ([here's a list for supported LLM](https://inference.readthedocs.io/en/latest/models/builtin/)), say, **mistral**.
Execute the following command to launch the model, remember to replace `${quantization}` with your chosen quantization method from the options listed above:
Decide which LLM to deploy ([here's a list for supported LLM](https://inference.readthedocs.io/en/latest/models/builtin/)), say, **mistral**.
Execute the following command to launch the model, ensuring that you replace `${quantization}` with your chosen quantization method from the options listed above:
```bash
$ xinference launch -u mistral --model-name mistral-v0.1 --size-in-billions 7 --model-format pytorch --quantization ${quantization}
```
Expand Down
19 changes: 11 additions & 8 deletions docs/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ This quick start guide describes a general process from:

## Prerequisites

- CPU >= 4 cores
- RAM >= 16 GB
- Disk >= 50 GB
- Docker >= 24.0.0 & Docker Compose >= v2.26.1
- CPU &ge; 4 cores
- RAM &ge; 16 GB
- Disk &ge; 50 GB
- Docker &ge; 24.0.0 & Docker Compose &ge; v2.26.1

> If you have not installed Docker on your local machine (Windows, Mac, or Linux), see [Install Docker Engine](https://docs.docker.com/engine/install/).

Expand All @@ -30,7 +30,7 @@ This quick start guide describes a general process from:
This section provides instructions on setting up the RAGFlow server on Linux. If you are on a different operating system, no worries. Most steps are alike.

<details>
<summary>1. Ensure <code>vm.max_map_count</code> >= 262144:</summary>
<summary>1. Ensure <code>vm.max_map_count</code> &ge; 262144:</summary>

`vm.max_map_count`. This value sets the the maximum number of memory map areas a process may have. Its default value is 65530. While most applications require fewer than a thousand maps, reducing this value can result in abmornal behaviors, and the system will throw out-of-memory errors when a process reaches the limitation.

Expand Down Expand Up @@ -168,7 +168,9 @@ This section provides instructions on setting up the RAGFlow server on Linux. If

5. In your web browser, enter the IP address of your server and log in to RAGFlow.

> - With default settings, you only need to enter `http://IP_OF_YOUR_MACHINE` (**sans** port number) as the default HTTP serving port `80` can be omitted when using the default configurations.
:::caution WARNING
With default settings, you only need to enter `http://IP_OF_YOUR_MACHINE` (**sans** port number) as the default HTTP serving port `80` can be omitted when using the default configurations.
:::

## Configure LLMs

Expand All @@ -188,7 +190,7 @@ To add and configure an LLM:

1. Click on your logo on the top right of the page **>** **Model Providers**:

![2 add llm](https://github.com/infiniflow/ragflow/assets/93570324/10635088-028b-4b3d-add9-5c5a6e626814)
![add llm](https://github.com/infiniflow/ragflow/assets/93570324/10635088-028b-4b3d-add9-5c5a6e626814)

> Each RAGFlow account is able to use **text-embedding-v2** for free, a embedding model of Tongyi-Qianwen. This is why you can see Tongyi-Qianwen in the **Added models** list. And you may need to update your Tongyi-Qianwen API key at a later point.

Expand Down Expand Up @@ -286,4 +288,5 @@ Conversations in RAGFlow are based on a particular knowledge base or multiple kn

![question1](https://github.com/infiniflow/ragflow/assets/93570324/bb72dd67-b35e-4b2a-87e9-4e4edbd6e677)

![question2](https://github.com/infiniflow/ragflow/assets/93570324/7cc585ae-88d0-4aa2-817d-0370b2ad7230)
![question2](https://github.com/infiniflow/ragflow/assets/93570324/7cc585ae-88d0-4aa2-817d-0370b2ad7230)import { resetWarningCache } from 'prop-types';

8 changes: 4 additions & 4 deletions docs/references/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,10 +109,10 @@ This method retrieves the history of a specified conversation session.
- `content_with_weight`: Content of the chunk.
- `doc_name`: Name of the *hit* document.
- `img_id`: The image ID of the chunk. It is an optional field only for PDF, PPTX, and images. Call ['GET' /document/get/\<id\>](#get-document-content) to retrieve the image.
- positions: [page_number, [upleft corner(x, y)], [right bottom(x, y)]], the chunk position, only for PDF.
- similarity: The hybrid similarity.
- term_similarity: The keyword simimlarity.
- vector_similarity: The embedding similarity.
- `positions`: [page_number, [upleft corner(x, y)], [right bottom(x, y)]], the chunk position, only for PDF.
- `similarity`: The hybrid similarity.
- `term_similarity`: The keyword simimlarity.
- `vector_similarity`: The embedding similarity.
- `doc_aggs`:
- `doc_id`: ID of the *hit* document. Call ['GET' /document/get/\<id\>](#get-document-content) to retrieve the document.
- `doc_name`: Name of the *hit* document.
Expand Down