Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Main update 12 Aug #215

Merged
merged 55 commits into from
Aug 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
1eba86c
add tick
Guocork Jul 30, 2024
4fb515c
add logical
Guocork Jul 31, 2024
5ff0de3
add judge
Guocork Jul 31, 2024
c39d0fc
fix bug
Guocork Jul 31, 2024
95100f1
add arrow
Guocork Aug 1, 2024
d8de0a0
fix store
Guocork Aug 2, 2024
bc17f14
Stop chat streaming when switching conversation
jmbejar Aug 2, 2024
c4b6613
add animation
Guocork Aug 3, 2024
aad8dec
adjust layout
Guocork Aug 5, 2024
b6fc01b
Reimplement Modal using Makepad overlay features
jmbejar Aug 5, 2024
042d773
Reimplement model files info modals
jmbejar Aug 5, 2024
cfc209d
Clean warnings and dead code
jmbejar Aug 5, 2024
bc91139
Reimplement Delete model file confirmation dialog
jmbejar Aug 5, 2024
a277c3c
Reimplement Delete Chat confimation dialog
jmbejar Aug 5, 2024
270c626
Add missing end-of-file char
jmbejar Aug 5, 2024
8c4f345
moxin-runner: on Windows, support WasmEdge no-avx WASI-nn plugin (rel…
kevinaboos Aug 5, 2024
96d4ded
disable overly verbose output from powershell
kevinaboos Aug 5, 2024
eee40fb
Ensure .zip extension is given to temp-downloaded zipfiles
kevinaboos Aug 6, 2024
57426b3
Merge branch 'main' into windows_fallback_to_wasi_nn_b3499_release
kevinaboos Aug 6, 2024
b387164
README: add info for selecting the right WASI-nn plugin version on wi…
kevinaboos Aug 6, 2024
7d150b9
further improve windows readme
kevinaboos Aug 6, 2024
fa6c18b
Specify that CUDA v12 is required on windows
kevinaboos Aug 6, 2024
11f7dc7
Move powershell script into Rust code so we can parameterize it.
kevinaboos Aug 5, 2024
7ad0c9d
fix the issue
Guocork Aug 6, 2024
7cc9803
Merge pull request #193 from kevinaboos/windows_fallback_to_wasi_nn_b…
jmbejar Aug 6, 2024
ca37d07
Merge pull request #192 from moxin-org/modals-revision
jmbejar Aug 6, 2024
4e22cbc
[Backend] Support frontend change ctx_size & batch_size
L-jasmine Aug 6, 2024
9efb754
[Backend] let the system decide api-server.wasm port
L-jasmine Aug 6, 2024
fd6fc34
Fix chat without stream error
L-jasmine Aug 6, 2024
f906fac
Merge pull request #191 from Guocork/main
jmbejar Aug 6, 2024
b740ee7
Merge branch 'main' into dev
jmbejar Aug 6, 2024
367c715
Fix problems with button hovers
jmbejar Jul 31, 2024
31f3562
Do not show chat history options button on title editing
jmbejar Jul 31, 2024
5ab1b12
Use reset_hover_on_click in certain buttons
jmbejar Aug 2, 2024
bb8c8df
Fix most of cursor issues when moving mouse pointer out of buttons
jmbejar Aug 2, 2024
c298b82
Merge pull request #187 from moxin-org/fix-button-hovers
jmbejar Aug 6, 2024
5999b54
Merge pull request #189 from moxin-org/stop-streaming-when-switching-…
jmbejar Aug 6, 2024
76aa78a
Fix visibility issue of loading animation in the Model Selector
jmbejar Aug 6, 2024
ecb4cb0
[Backend] change LoadModelOptions.n_ctx from u32 to Option<u32>
L-jasmine Aug 6, 2024
60715bd
Merge pull request #196 from moxin-org/fix-loading-animation-model-sw…
jmbejar Aug 6, 2024
8bb9d87
Merge pull request #194 from L-jasmine/feat/ctx-size
jmbejar Aug 6, 2024
3fefbe7
readme: mention that Windows is now supported
kevinaboos Aug 6, 2024
3b5483a
Merge pull request #197 from moxin-org/readme-windows-support
jmbejar Aug 6, 2024
d3d9b69
moxin-runner: support installing no-AVX versions of WasmEdge on Linux
kevinaboos Aug 6, 2024
234ef09
complete testing of new no-AVX support on Linux, Windows, and macOS
kevinaboos Aug 6, 2024
fc97ac7
Update to robius-open v0.1.1, which properly opens file URIs on Windows
kevinaboos Aug 7, 2024
3b85993
Merge pull request #201 from kevinaboos/fix_file_uri
jmbejar Aug 7, 2024
7759593
Merge pull request #198 from kevinaboos/linux_noavx_support
jmbejar Aug 7, 2024
bb5ad86
Use or install WasmEdge to the app-specific data directory
kevinaboos Aug 7, 2024
3fec28f
tested working on windows
kevinaboos Aug 7, 2024
e11e657
Merge pull request #205 from kevinaboos/use_wasmedge_installation_in_…
jmbejar Aug 7, 2024
1d5a1a9
fix issue 186
Guocork Aug 9, 2024
5c2e446
Merge pull request #212 from Guocork/fixed-issue-186
jmbejar Aug 12, 2024
d83d181
Fix intermittent problem in ModelCard on title or description
jmbejar Aug 12, 2024
245d104
Merge pull request #213 from moxin-org/fix-model-card-title-and-descr…
jmbejar Aug 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 28 additions & 7 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ moxin-fake-backend = { path = "moxin-fake-backend" }

makepad-widgets = { git = "https://github.com/jmbejar/makepad", branch = "moxin-release-v1" }

robius-open = "0.1.0"
robius-open = "0.1.1"
robius-url-handler = { git = "https://github.com/project-robius/robius-url-handler" }

chrono = "0.4"
Expand Down
36 changes: 27 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ The following table shows which host systems can currently be used to build Moxi
| ------- | --------------- | ------- | ----- | -------------------------------------------- |
| macOS | macOS | ✅ | ✅ | `.app`, [`.dmg`] |
| Linux | Linux | ✅ | ✅ | [`.deb` (Debian dpkg)], [AppImage], [pacman] |
| Windows | Windows (10+) | ✅ | ✅ | `.exe` (NSIS) |

## Building and Running

Expand Down Expand Up @@ -41,6 +42,9 @@ curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/insta
source $HOME/.wasmedge/env
```

> [!IMPORTANT]
> If your CPU does not support AVX512, then you should append the `--noavx` option onto the above command.

To build Moxin on Linux, you must install the following dependencies:
`openssl`, `clang`/`libclang`, `binfmt`, `Xcursor`/`X11`, `asound`/`pulse`.

Expand All @@ -64,6 +68,15 @@ cargo run --release

2. Restart your PC, or log out and log back in, which allows the LLVM path to be properly
* Alternatively you can add the LLVM path `C:\Program Files\LLVM\bin` to your system PATH.


> [!TIP]
> To automatically handle Steps 3 and 4, simply run:
> ```sh
> cargo run -p moxin-runner -- --install
> ```


3. Download the [WasmEdge-0.14.0-windows.zip](https://github.com/WasmEdge/WasmEdge/releases/download/0.14.0/WasmEdge-0.14.0-windows.zip) file from [the WasmEdge v0.14.0 release page](https://github.com/WasmEdge/WasmEdge/releases/tag/0.14.0),
and then extract it into a directory of your choice.
We recommend using your home directory (e.g., `C:\Users\<USERNAME>\`), represented by `$home` in powershell and `%homedrive%%homepath%` in batch-cmd.
Expand All @@ -78,18 +91,23 @@ cargo run --release
$ProgressPreference = 'Continue' ## restore default progress bars
```

4. Download the WasmEdge WASI-NN plugin here: [WasmEdge-plugin-wasi_nn-ggml-0.14.0-windows_x86_64.zip](https://github.com/WasmEdge/WasmEdge/releases/download/0.14.0/WasmEdge-plugin-wasi_nn-ggml-0.14.0-windows_x86_64.zip) (15.5MB) and extract it to the same directory as above, e.g., `C:\Users\<USERNAME>\WasmEdge-0.14.0-Windows`.
4. Download [the appropriate WasmEdge WASI-NN plugin](https://github.com/second-state/WASI-NN-GGML-PLUGIN-REGISTRY/releases/tag/b3499) (see below for details), extract/unzip it, and copy the `lib\wasmedge` directory from the .zip archive into the `lib\` directory of the above WasmEdge installation directory, e.g., `C:\Users\<USERNAME>\WasmEdge-0.14.0-Windows\lib`.

> [!IMPORTANT]
> You will be asked whether you want to replace the files that already exist; select `Replace the files in the destination` when doing so.
* To do this quickly in powershell:
```powershell
$ProgressPreference = 'SilentlyContinue' ## makes downloads much faster
Invoke-WebRequest -Uri "https://github.com/WasmEdge/WasmEdge/releases/download/0.14.0/WasmEdge-plugin-wasi_nn-ggml-0.14.0-windows_x86_64.zip" -OutFile "WasmEdge-plugin-wasi_nn-ggml-0.14.0-windows_x86_64.zip"
Expand-Archive -Force -LiteralPath "WasmEdge-plugin-wasi_nn-ggml-0.14.0-windows_x86_64.zip" -DestinationPath "$home\WasmEdge-0.14.0-Windows"
$ProgressPreference = 'Continue' ## restore default progress bars
```
> The only file that matters is the plugin file, which must exist at the path `WasmEdge-0.14.0-Windows\lib\wasmedge\wasmedgePluginWasiNN.dll`

* If your computer has a CUDA v12-capable GPU, select [WasmEdge-plugin-wasi_nn-ggml-cuda-0.14.0-windows_x86_64.zip](https://github.com/second-state/WASI-NN-GGML-PLUGIN-REGISTRY/releases/download/b3499/WasmEdge-plugin-wasi_nn-ggml-cuda-0.14.0-windows_x86_64.zip).
* Note that **CUDA version 12** is required.
* If your computer doesn't have CUDA 12, then select either:
* [WasmEdge-plugin-wasi_nn-ggml-0.14.0-windows_x86_64.zip](https://github.com/second-state/WASI-NN-GGML-PLUGIN-REGISTRY/releases/download/b3499/WasmEdge-plugin-wasi_nn-ggml-0.14.0-windows_x86_64.zip) if your CPU supports AVX-512, or
* [WasmEdge-plugin-wasi_nn-ggml-noavx-0.14.0-windows_x86_64.zip](https://github.com/second-state/WASI-NN-GGML-PLUGIN-REGISTRY/releases/tag/b3499#:~:text=WasmEdge%2Dplugin%2Dwasi_nn%2Dggml%2Dnoavx%2D0.14.0%2Dwindows_x86_64.zip) if your CPU does *not* support AVX-512.


5. Set the `WASMEDGE_DIR` and `WASMEDGE_PLUGIN_PATH` environment variables to point to the `WasmEdge-0.14.0-Windows` directory that you extracted above, and then build Moxin.

> [!IMPORTANT]
> You may also need to add the `WasmEdge-0.14.0-Windows\bin` directory to your `PATH` environment variable (on some versions of Windows).

In powershell, you can do this like so:
```powershell
$env:WASMEDGE_DIR="$home\WasmEdge-0.14.0-Windows\"
Expand Down
35 changes: 27 additions & 8 deletions moxin-backend/src/backend_impls/api_server.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ static WASM: &[u8] = include_bytes!("../../wasm/llama-api-server.wasm");
pub struct LLamaEdgeApiServer {
id: String,
listen_addr: SocketAddr,
load_model_options: LoadModelOptions,
wasm_module: Module,
running_controller: tokio::sync::broadcast::Sender<()>,
#[allow(dead_code)]
Expand All @@ -35,15 +36,23 @@ fn create_wasi(
load_model: &LoadModelOptions,
) -> wasmedge_sdk::WasmEdgeResult<WasiModule> {
// use model metadata context size
let ctx_size = Some(format!("{}", file.context_size.min(8 * 1024)));
let ctx_size = if let Some(n_ctx) = load_model.n_ctx {
Some(format!("{}", n_ctx))
} else {
Some(format!("{}", file.context_size.min(8 * 1024)))
};

let n_gpu_layers = match load_model.gpu_layers {
moxin_protocol::protocol::GPULayers::Specific(n) => Some(n.to_string()),
moxin_protocol::protocol::GPULayers::Max => None,
};

// Set n_batch to a fixed value of 128.
let batch_size = Some(format!("128"));
let batch_size = if let Some(n_batch) = load_model.n_batch {
Some(format!("{}", n_batch))
} else {
Some("128".to_string())
};

let mut prompt_template = load_model.prompt_template.clone();
if prompt_template.is_none() && !file.prompt_template.is_empty() {
Expand Down Expand Up @@ -133,17 +142,23 @@ impl BackendModel for LLamaEdgeApiServer {
options: moxin_protocol::protocol::LoadModelOptions,
tx: std::sync::mpsc::Sender<anyhow::Result<moxin_protocol::protocol::LoadModelResponse>>,
) -> Self {
let load_model_options = options.clone();
let mut need_reload = true;
let (wasm_module, listen_addr) = if let Some(old_model) = &old_model {
if old_model.id == file.id.as_str() {
if old_model.id == file.id.as_str()
&& old_model.load_model_options.n_ctx == options.n_ctx
&& old_model.load_model_options.n_batch == options.n_batch
{
need_reload = false;
}
(old_model.wasm_module.clone(), old_model.listen_addr)
} else {
(
Module::from_bytes(None, WASM).unwrap(),
([0, 0, 0, 0], 8080).into(),
)
let new_addr = std::net::TcpListener::bind("localhost:0")
.unwrap()
.local_addr()
.unwrap();

(Module::from_bytes(None, WASM).unwrap(), new_addr)
};

if !need_reload {
Expand All @@ -152,6 +167,7 @@ impl BackendModel for LLamaEdgeApiServer {
file_id: file.id.to_string(),
model_id: file.model_id,
information: "".to_string(),
listen_port: listen_addr.port(),
},
)));
return old_model.unwrap();
Expand All @@ -165,7 +181,8 @@ impl BackendModel for LLamaEdgeApiServer {

let file_id = file.id.to_string();

let url = format!("http://localhost:{}/echo", listen_addr.port());
let listen_port = listen_addr.port();
let url = format!("http://localhost:{}/echo", listen_port);

let file_ = file.clone();

Expand Down Expand Up @@ -197,6 +214,7 @@ impl BackendModel for LLamaEdgeApiServer {
file_id: file_.id.to_string(),
model_id: file_.model_id,
information: "".to_string(),
listen_port,
},
)));
} else {
Expand All @@ -212,6 +230,7 @@ impl BackendModel for LLamaEdgeApiServer {
listen_addr,
running_controller,
model_thread,
load_model_options,
};

new_model
Expand Down
2 changes: 2 additions & 0 deletions moxin-backend/src/backend_impls/chat_ui.rs
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,7 @@ fn get_input(
file_id,
model_id,
information: String::new(),
listen_port: 0,
})));
}

Expand Down Expand Up @@ -430,6 +431,7 @@ impl super::BackendModel for ChatBotModel {
file_id: file.id.to_string(),
model_id: file.model_id,
information: "".to_string(),
listen_port: 0,
})));
return old_model.unwrap();
}
Expand Down
4 changes: 4 additions & 0 deletions moxin-backend/src/backend_impls/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,8 @@ fn test_chat() {
rope_freq_scale: 0.0,
rope_freq_base: 0.0,
context_overflow_policy: moxin_protocol::protocol::ContextOverflowPolicy::StopAtLimit,
n_batch: Some(128),
n_ctx: Some(1024),
},
tx,
);
Expand Down Expand Up @@ -209,6 +211,8 @@ fn test_chat_stop() {
prompt_template: None,
gpu_layers: moxin_protocol::protocol::GPULayers::Max,
use_mlock: false,
n_batch: Some(128),
n_ctx: Some(1024),
rope_freq_scale: 0.0,
rope_freq_base: 0.0,
context_overflow_policy: moxin_protocol::protocol::ContextOverflowPolicy::StopAtLimit,
Expand Down
1 change: 1 addition & 0 deletions moxin-protocol/src/open_ai.rs
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ pub struct ChatResponseData {
pub choices: Vec<ChoiceData>,
pub created: u32,
pub model: ModelID,
#[serde(default)]
pub system_fingerprint: String,
pub usage: UsageData,

Expand Down
7 changes: 6 additions & 1 deletion moxin-protocol/src/protocol.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,10 @@ pub struct LoadModelOptions {
pub prompt_template: Option<String>,
pub gpu_layers: GPULayers,
pub use_mlock: bool,
pub n_batch: Option<u32>,
pub n_ctx: Option<u32>,
pub rope_freq_scale: f32,
pub rope_freq_base: f32,

// TBD Not really sure if this is something backend manages or if it is matter of
// the client (if it is done by tweaking the JSON payload for the chat completition)
pub context_overflow_policy: ContextOverflowPolicy,
Expand All @@ -41,6 +42,10 @@ pub struct LoadedModelInfo {
pub file_id: FileID,
pub model_id: ModelID,

// The port where the local server is listening for the model.
// if 0, the server is not running.
pub listen_port: u16,

// JSON formatted string with the model information. See "Model Inspector" in LMStudio.
pub information: String,
}
Expand Down
Loading
Loading