Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: docs - remove Architecture page, clean up Basic Usage #1637

Merged
merged 3 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/docs/architecture.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
title: Architecture
description: Cortex Architecture
slug: "architecture"
draft: true
---

:::warning
Expand Down
90 changes: 0 additions & 90 deletions docs/docs/basic-usage/api-server.mdx

This file was deleted.

206 changes: 99 additions & 107 deletions docs/docs/basic-usage/index.mdx
Original file line number Diff line number Diff line change
@@ -1,136 +1,128 @@
---
title: Overview
description: Cortex Overview
slug: "basic-usage"
title: Cortex Basic Usage
description: Cortex Usage Overview
---


import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

:::warning
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
:::

Cortex has an [API server](https://cortex.so/api-reference) that runs at `localhost:39281`.

The port parameter can be set in [`.cortexrc`](/docs/architecture/cortexrc) with the `apiServerPort` parameter

## Server
### Start Cortex Server
```bash
# By default the server will be started on port `39281`
cortex
# Start a server with different port number
cortex -a <address> -p <port_number>
# Set the data folder directory
cortex --dataFolder <dataFolderPath>
```

### Terminate Cortex Server
```bash
curl --request DELETE \
--url http://127.0.0.1:39281/processManager/destroy
```

## Engines
Cortex currently supports 3 industry-standard engines: llama.cpp, ONNXRuntime and TensorRT-LLM.

By default, Cortex installs llama.cpp engine which supports most laptops, desktops and OSes.

For more information, see [Engine Management](/docs/engines)

## Usage
### Start Cortex.cpp Server
<Tabs>
<TabItem value="MacOs/Linux" label="MacOs/Linux">
```sh
# Stable
cortex start

# Beta
cortex-beta start

# Nightly
cortex-nightly start
```
</TabItem>
<TabItem value="Windows" label="Windows">
```sh
# Stable
cortex.exe start

# Beta
cortex-beta.exe start

# Nightly
cortex-nightly.exe start
```
</TabItem>
</Tabs>
### Run Model
### List available engines
```bash
curl --request GET \
--url http://127.0.0.1:39281/v1/engines
```

### Install an Engine (eg llama-cpp)
```bash
curl --request POST \
--url http://127.0.0.1:39281/v1/engines/install/llama-cpp
```

## Manage Models
### Pull Model
```bash
# Pull a model
curl --request POST \
--url http://localhost:39281/v1/models/pull \
--url http://127.0.0.1:39281/v1/models/pull \
-H "Content-Type: application/json" \
--header 'Content-Type: application/json' \
--data '{
"model": "tinyllama:gguf",
"id": "my-custom-model-id",
}'
```
If the model download was interrupted, this request will download the remainder of the model files.

The downloaded models are saved to the [Cortex Data Folder](/docs/architecture/data-folder).

### Stop Model Download
```bash
❯ curl --request DELETE \
--url http://127.0.0.1:39281/v1/models/pull \
--header 'Content-Type: application/json' \
--data '{
"model": "mistral:gguf"
}'
"taskId": "tinyllama:1b-gguf"
}'
```

### List Models
```bash
curl --request GET \
--url http://127.0.0.1:39281/v1/models
```

### Delete Model
```bash
curl --request DELETE \
--url http://127.0.0.1:39281/v1/models/tinyllama:1b-gguf
```

## Run Models
### Start Model
```bash
# Start the model
curl --request POST \
--url http://localhost:39281/v1/models/start \
--url http://127.0.0.1:39281/v1/models/start \
--header 'Content-Type: application/json' \
--data '{
"model": "mistral:gguf"
"prompt_template": "system\n{system_message}\nuser\n{prompt}\nassistant",
"stop": [],
"ngl": 4096,
"ctx_len": 4096,
"cpu_threads": 10,
"n_batch": 2048,
"caching_enabled": true,
"grp_attn_n": 1,
"grp_attn_w": 512,
"mlock": false,
"flash_attn": true,
"cache_type": "f16",
"use_mmap": true,
"engine": "llama-cpp"
"model": "tinyllama:1b-gguf"
}'
```
### Chat with Model

### Create Chat Completion
```bash
# Invoke the chat completions endpoint
curl http://localhost:39281/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "Hello"
},
],
"model": "mistral:gguf",
"stream": true,
"max_tokens": 1,
"stop": [
null
],
"frequency_penalty": 1,
"presence_penalty": 1,
"temperature": 1,
"top_p": 1
curl --request POST \
--url http://localhost:39281/v1/chat/completions \
-H "Content-Type: application/json" \
--data '{
"messages": [
{
"role": "user",
"content": "Write a Haiku about cats and AI"
},
],
"model": "tinyllama:1b-gguf",
"stream": false,
}'
```

### Stop Model
```bash
# Stop a model
curl --request POST \
--url http://localhost:39281/v1/models/stop \
--url http://127.0.0.1:39281/v1/models/stop \
--header 'Content-Type: application/json' \
--data '{
"model": "mistral:gguf"
}'
"model": "tinyllama:1b-gguf"
}'
```
### Stop Cortex.cpp Server
<Tabs>
<TabItem value="MacOs/Linux" label="MacOs/Linux">
```sh
# Stable
cortex stop

# Beta
cortex-beta stop

# Nightly
cortex-nightly stop
```
</TabItem>
<TabItem value="Windows" label="Windows">
```sh
# Stable
cortex.exe stop

# Beta
cortex-beta.exe stop

# Nightly
cortex-nightly.exe stop
```
</TabItem>
</Tabs>


6 changes: 3 additions & 3 deletions docs/sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ const sidebars: SidebarsConfig = {
link: { type: "doc", id: "basic-usage/index" },
collapsed: true,
items: [
{ type: "doc", id: "basic-usage/api-server", label: "API Server" },
{
type: "doc",
id: "basic-usage/cortex-js",
Expand All @@ -69,8 +68,9 @@ const sidebars: SidebarsConfig = {
type: "category",
label: "Architecture",
link: {
type: "doc",
id: "architecture"
type: "generated-index",
// type: "doc",
// id: "architecture" // is outdated
},
collapsed: true,
items: [
Expand Down
Loading