Update llm_inference.md #245

alabulei1 · 2024-07-29T03:08:56Z

Explanation

Use a smaller batch size to make sure the llama-simple.wasm can work on most machines.

Related issue

LlamaEdge/LlamaEdge#199

What type of PR is this

/kind documentation

Signed-off-by: alabulei1 <vivian.xiage@gmail.com>

juntao · 2024-07-29T03:08:59Z

Hello, I am a PR summary agent on flows.network. Here are my reviews of code commits in this PR.

The GitHub patch primarily focuses on updating the documentation for running open-source Large Language Models (LLMs) with WasmEdge and Rust, specifically to reflect that it now supports any open-source LLM model beyond just Llama2. The supported models list is no longer mentioned directly in the documentation but instead, it's stated that WasmEdge can support any open-source LLMs.

Potential Issues:

The removal of a table listing various prompt templates for different models might create confusion for users who are not familiar with these template variations.
Changing the context size in an example from 4096 to 512 could lead to performance issues if not properly considered and adapted to the new model's capabilities.

Key Findings:

The --prompt-template option is explained, allowing for supporting different open source LLM models beyond llama2.
The documentation now includes commands to print out logs and statistics of the model at runtime, which was not mentioned before.
The example used in the "Understand the code" section has been updated to clarify that it's for a simple inference round trip, not multi-turn conversations.
A new option --ctx-size is introduced to specify the context window size for the application, with a note that increasing this may require adjusting the --batch-size.
The explanation for handling invalid UTF8 sequences during output conversion in Rust code has been corrected.

These changes enhance the application's flexibility by adding support for various LLM models and improving logging capabilities.

Details

Commit 5d00cd4515d6626a1fa44b48fd8cfa18be951f46

The patch updates the documentation for Llama 2 inference with WasmEdge and Rust to reflect that it now supports running open-source models beyond just llama2.
It expands the list of supported models, including Nous-Hermes-2-Mixtral-8x7B-DPO and Nous-Hermes-2-Mixtral-8x7B-SFT. The full list can be found in a linked document.
The documentation now mentions that the --prompt-template option allows for supporting different open source LLM models beyond llama2, which was not mentioned before.
A table listing various prompt templates for different models has been removed from the documentation.
The context size in a command line example is changed from 4096 to 512.
In the Rust code snippet, there's a correction to the comment that explains how invalid UTF8 sequences are handled during output conversion.
The documentation now mentions that for multi-turn conversations with open source LLMs and constructing OpenAI-compatible APIs specifically for any open-source LLM, users can refer to the chat example source code and API server source code respectively. Previously, it was mentioned only for llama2 model or Llama2 itself.

Commit 77d94e51ea350abc4ce32f45d639001eb6b35efd

The GitHub patch updates the "Llama 2 inference" documentation to "LLM inference".
- The supported models list is removed and replaced with a statement that WasmEdge can support any open-source LLMs.
- The example model used for demonstration has been changed from Llama-2-7B-Chat to Meta-Llama-3.1-8B-Instruct in GGUF format.
- The prompt template option is explained, and it's mentioned that the llama-3-chat template can be used for the new model.
- A note about AOT compilation to improve performance is added.
- The resources section is updated to reflect the change in the example project's name and the model used.

Commit 4e0b7bcaa3ff95b30df7b29e20e3015be41b6e7a

The patch updates the documentation for running open-source Large Language Models (LLMs) in Rust with WasmEdge.
- It clarifies that WasmEdge can support any open-source LLM model, not just Llama models.
- The command to run the inference application has been updated to use "llama-3-chat" instead of "llama-a-chat".
- A new option --ctx-size is introduced to specify the context window size for the application, with a note that increasing this may require adjusting the --batch-size.
- The documentation now includes commands to print out logs and statistics of the model at runtime.
- The example used in the "Understand the code" section has been updated to clarify that it's for a simple inference round trip, not multi-turn conversations.
- A typo in the resources section has been corrected: "llama2 models" changed to "llama models".

Commit 9bc6f0a042852ecfd6e22a095c9d8635bb74a621

Key Change 1: The --prompt-template option allows the application to support different open source LLM models beyond llama2.
Key Change 2: The explanation for the --ctx-size option has been simplified and now only mentions that it should be within the model's intrinsic context window size.
Key Change 3: The --log-stat option replaces the previous combination of --print-timing and --print-graph.

These changes enhance the application's flexibility by adding support for various LLM models and improving logging capabilities.

juntao

Change the example to llama 3.1.

Signed-off-by: alabulei1 <vivian.xiage@gmail.com>

Signed-off-by: Michael Yuan <michael@secondstate.io>

Update llm_inference.md

5d00cd4

Signed-off-by: alabulei1 <vivian.xiage@gmail.com>

juntao requested changes Jul 29, 2024

View reviewed changes

alabulei1 and others added 3 commits July 29, 2024 17:21

Update llm_inference.md

77d94e5

Signed-off-by: alabulei1 <vivian.xiage@gmail.com>

Update llm_inference.md

4e0b7bc

Update llm_inference.md

9bc6f0a

Signed-off-by: Michael Yuan <michael@secondstate.io>

juntao approved these changes Jul 29, 2024

View reviewed changes

juntao merged commit f373a49 into main Jul 29, 2024
6 checks passed

juntao deleted the alabulei1-patch-2 branch July 29, 2024 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update llm_inference.md #245

Update llm_inference.md #245

alabulei1 commented Jul 29, 2024

juntao commented Jul 29, 2024 •

edited

Loading

juntao left a comment

Update llm_inference.md #245

Update llm_inference.md #245

Conversation

alabulei1 commented Jul 29, 2024

Explanation

Related issue

What type of PR is this

juntao commented Jul 29, 2024 • edited Loading

Details

Commit 5d00cd4515d6626a1fa44b48fd8cfa18be951f46

Commit 77d94e51ea350abc4ce32f45d639001eb6b35efd

Commit 4e0b7bcaa3ff95b30df7b29e20e3015be41b6e7a

Commit 9bc6f0a042852ecfd6e22a095c9d8635bb74a621

juntao left a comment

Choose a reason for hiding this comment

juntao commented Jul 29, 2024 •

edited

Loading