Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update llm_inference.md #245

Merged
merged 4 commits into from
Jul 29, 2024
Merged

Update llm_inference.md #245

merged 4 commits into from
Jul 29, 2024

Conversation

alabulei1
Copy link
Collaborator

Explanation

Use a smaller batch size to make sure the llama-simple.wasm can work on most machines.

Related issue

LlamaEdge/LlamaEdge#199

What type of PR is this

/kind documentation

Signed-off-by: alabulei1 <vivian.xiage@gmail.com>
Copy link
Member

juntao commented Jul 29, 2024

Hello, I am a PR summary agent on flows.network. Here are my reviews of code commits in this PR.


The GitHub patch primarily focuses on updating the documentation for running open-source Large Language Models (LLMs) with WasmEdge and Rust, specifically to reflect that it now supports any open-source LLM model beyond just Llama2. The supported models list is no longer mentioned directly in the documentation but instead, it's stated that WasmEdge can support any open-source LLMs.

Potential Issues:

  1. The removal of a table listing various prompt templates for different models might create confusion for users who are not familiar with these template variations.
  2. Changing the context size in an example from 4096 to 512 could lead to performance issues if not properly considered and adapted to the new model's capabilities.

Key Findings:

  1. The --prompt-template option is explained, allowing for supporting different open source LLM models beyond llama2.
  2. The documentation now includes commands to print out logs and statistics of the model at runtime, which was not mentioned before.
  3. The example used in the "Understand the code" section has been updated to clarify that it's for a simple inference round trip, not multi-turn conversations.
  4. A new option --ctx-size is introduced to specify the context window size for the application, with a note that increasing this may require adjusting the --batch-size.
  5. The explanation for handling invalid UTF8 sequences during output conversion in Rust code has been corrected.

These changes enhance the application's flexibility by adding support for various LLM models and improving logging capabilities.

Details

Commit 5d00cd4515d6626a1fa44b48fd8cfa18be951f46

  1. The patch updates the documentation for Llama 2 inference with WasmEdge and Rust to reflect that it now supports running open-source models beyond just llama2.

  2. It expands the list of supported models, including Nous-Hermes-2-Mixtral-8x7B-DPO and Nous-Hermes-2-Mixtral-8x7B-SFT. The full list can be found in a linked document.

  3. The documentation now mentions that the --prompt-template option allows for supporting different open source LLM models beyond llama2, which was not mentioned before.

  4. A table listing various prompt templates for different models has been removed from the documentation.

  5. The context size in a command line example is changed from 4096 to 512.

  6. In the Rust code snippet, there's a correction to the comment that explains how invalid UTF8 sequences are handled during output conversion.

  7. The documentation now mentions that for multi-turn conversations with open source LLMs and constructing OpenAI-compatible APIs specifically for any open-source LLM, users can refer to the chat example source code and API server source code respectively. Previously, it was mentioned only for llama2 model or Llama2 itself.

Commit 77d94e51ea350abc4ce32f45d639001eb6b35efd

  • The GitHub patch updates the "Llama 2 inference" documentation to "LLM inference".
    • The supported models list is removed and replaced with a statement that WasmEdge can support any open-source LLMs.
    • The example model used for demonstration has been changed from Llama-2-7B-Chat to Meta-Llama-3.1-8B-Instruct in GGUF format.
    • The prompt template option is explained, and it's mentioned that the llama-3-chat template can be used for the new model.
    • A note about AOT compilation to improve performance is added.
    • The resources section is updated to reflect the change in the example project's name and the model used.

Commit 4e0b7bcaa3ff95b30df7b29e20e3015be41b6e7a

  • The patch updates the documentation for running open-source Large Language Models (LLMs) in Rust with WasmEdge.
    • It clarifies that WasmEdge can support any open-source LLM model, not just Llama models.
    • The command to run the inference application has been updated to use "llama-3-chat" instead of "llama-a-chat".
    • A new option --ctx-size is introduced to specify the context window size for the application, with a note that increasing this may require adjusting the --batch-size.
    • The documentation now includes commands to print out logs and statistics of the model at runtime.
    • The example used in the "Understand the code" section has been updated to clarify that it's for a simple inference round trip, not multi-turn conversations.
    • A typo in the resources section has been corrected: "llama2 models" changed to "llama models".

Commit 9bc6f0a042852ecfd6e22a095c9d8635bb74a621

  • Key Change 1: The --prompt-template option allows the application to support different open source LLM models beyond llama2.
  • Key Change 2: The explanation for the --ctx-size option has been simplified and now only mentions that it should be within the model's intrinsic context window size.
  • Key Change 3: The --log-stat option replaces the previous combination of --print-timing and --print-graph.

These changes enhance the application's flexibility by adding support for various LLM models and improving logging capabilities.

Copy link
Member

@juntao juntao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the example to llama 3.1.

alabulei1 and others added 3 commits July 29, 2024 17:21
Signed-off-by: alabulei1 <vivian.xiage@gmail.com>
Signed-off-by: Michael Yuan <michael@secondstate.io>
@juntao juntao merged commit f373a49 into main Jul 29, 2024
6 checks passed
@juntao juntao deleted the alabulei1-patch-2 branch July 29, 2024 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants