|
2 | 2 |
|
3 | 3 | <!-- vim-markdown-toc GFM --> |
4 | 4 |
|
5 | | -* [Foobar](#foobar) |
| 5 | +* [Preface](#preface) |
| 6 | +* [Deployment methods](#deployment-methods) |
| 7 | +* [Integration with Llama Stack framework](#integration-with-llama-stack-framework) |
| 8 | + * [Llama Stack as a library](#llama-stack-as-a-library) |
| 9 | + * [Llama Stack as a server](#llama-stack-as-a-server) |
| 10 | +* [Local deployment](#local-deployment) |
| 11 | + * [Llama Stack used as a library](#llama-stack-used-as-a-library) |
| 12 | + * [Llama Stack used as a separate process](#llama-stack-used-as-a-separate-process) |
| 13 | +* [Running from container](#running-from-container) |
| 14 | + * [Llama Stack used as a library](#llama-stack-used-as-a-library-1) |
| 15 | + * [Llama Stack used as a separate process in container](#llama-stack-used-as-a-separate-process-in-container) |
| 16 | + * [Llama Stack configuration](#llama-stack-configuration) |
6 | 17 |
|
7 | 18 | <!-- vim-markdown-toc --> |
8 | 19 |
|
| 20 | +## Preface |
| 21 | + |
| 22 | +In this document, you will learn how to install and run a service called *Lightspeed Core Stack (LCS)*. It is a service that allows users to communicate with large language models (LLMs), access to RAG databases, call so called agents, process conversation history, ensure that the conversation is only about permitted topics, etc. |
| 23 | + |
| 24 | + |
| 25 | + |
9 | 26 | ## Deployment methods |
10 | 27 |
|
| 28 | +*Lightspeed Core Stack (LCS)* is built on the Llama Stack framework, which can be run in several modes. Additionally, it is possible to run *LCS* locally (as a regular Python application) or from within a container. This means that it is possible to leverage multiple deployment methods: |
| 29 | + |
| 30 | +* Local deployment |
| 31 | + - Llama Stack framework is used as a library |
| 32 | + - Llama Stack framework is used as a separate process (deployed locally) |
| 33 | +* Running from a container |
| 34 | + - Llama Stack framework is used as a library |
| 35 | + - Llama Stack framework is used as a separate process in a container |
| 36 | + |
| 37 | +All those deployments methods will be covered later. |
| 38 | + |
| 39 | + |
| 40 | + |
| 41 | +## Integration with Llama Stack framework |
| 42 | + |
| 43 | +The Llama Stack framework can be run as a standalone server and accessed via its the REST API. However, instead of direct communication via the REST API (and JSON format), there is an even better alternative. It is based on the so-called Llama Stack Client. It is a library available for Python, Swift, Node.js or Kotlin, which "wraps" the REST API stack in a suitable way, which is easier for many applications. |
| 44 | + |
| 45 | + |
| 46 | + |
| 47 | +### Llama Stack as a library |
| 48 | + |
| 49 | +When this mode is selected, Llama Stack is used as a regular Python library. This means that the library must be installed in the system Python environment, a user-level environment, or a virtual environment. All calls to Llama Stack are performed via standard function or method calls: |
| 50 | + |
11 | 51 |  |
| 52 | + |
| 53 | +[!NOTE] |
| 54 | +Even when Llama Stack is used as a library, it still requires the configuration file `run.yaml` to be presented. This configuration file is loaded during initialization phase. |
| 55 | + |
| 56 | + |
| 57 | + |
| 58 | +### Llama Stack as a server |
| 59 | + |
| 60 | +When this mode is selected, Llama Stack is started as a separate REST API service. All communication with Llama Stack is performed via REST API calls, which means that Llama Stack can run on a separate machine if needed. |
| 61 | + |
12 | 62 |  |
13 | 63 |
|
| 64 | +[!NOTE] |
| 65 | +The REST API schema and semantics can change at any time, especially before version 1.0.0 is released. By using *Lightspeed Core Service*, developers, users, and customers stay isolated from these incompatibilities. |
| 66 | + |
| 67 | + |
| 68 | + |
14 | 69 | ## Local deployment |
15 | 70 |
|
16 | | -### Llama Stack used as a library |
| 71 | +In this chapter it will be shown how to run LCS locally. This mode is especially useful for developers, as it is possible to work with the latest versions of source codes, including locally made changes and improvements. And last but not least, it is possible to trace, monitor and debug the entire system from within integrated development environment etc. |
| 72 | + |
| 73 | + |
17 | 74 |
|
18 | 75 | ### Llama Stack used as a separate process |
19 | 76 |
|
20 | | -## Running from container |
| 77 | +The easiest option is to run Llama Stack in a separate process. This means that there will at least be two running processes involved: |
| 78 | + |
| 79 | +1. Llama Stack framework with open port 8321 (can be easily changed if needed) |
| 80 | +1. LCS with open port 8080 (can be easily changed if needed) |
21 | 81 |
|
22 | 82 | ### Llama Stack used as a library |
23 | 83 |
|
| 84 | +## Running from container |
| 85 | + |
24 | 86 | ### Llama Stack used as a separate process in container |
25 | 87 |
|
| 88 | +### Llama Stack used as a library |
| 89 | + |
| 90 | + |
26 | 91 |
|
27 | 92 |
|
28 | 93 | ```toml |
|
0 commit comments