Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display spinning/loading animation while an LLM response message is still streaming #199

Open
kevinaboos opened this issue Aug 6, 2024 · 1 comment
Labels
area: frontend enhancement New feature or request good first issue Good for newcomers help wanted Looking for help from anyone!

Comments

@kevinaboos
Copy link
Contributor

kevinaboos commented Aug 6, 2024

Problem

Currently, the UI makes it impossible to tell if a model has finished streaming its response back to the user, or if it is still underway and is just taking a long time to calculate the response. This is especially prevalent on slower machines without AVX-512 or CUDA, and even more of a problem when the message being returned by the model is very very long.

Proposed fix

To improve the user experience, a small spinning loading wheel (or some other kind of animation) should be added either beneath or next to the latest message in the chat view, which will spin or animate in some way that makes it obvious that the model is still generating a streaming a response. Then, once the message streaming has completed, the spinning animation would end and set itself to non-visible.

Additional improvements

Alternatively, that spinning animation could change to a tiny checkmark icon or something similar, in order to indicate that the message had been fully streamed from the model.

In addition, this small loading animation could also be displayed while a model is being loaded in the background, which again can take quite a long time (20+ seconds) on slower machines. This would help inform the user that something is indeed still happening in the background, rather than leaving the user confused while the UI does nothing for dozens of seconds.

Implementation starter ideas

The wasmedge frontend/backend does make it obvious when a message has completed being streamed. You can observe this while looking at the console log:

[2024-08-06T21:56:32Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:33Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:33Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:33Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:34Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:34Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:35Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:35Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:35Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:36Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:36Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:36Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06 14:56:36.967] [info] [WASI-NN] GGML backend: EOS token found

The last statement indicates that the stream has ended. Sometimes you may also observer other forms of stream ending, most of which are currently handled at the Rust level via the StopReason enum. Search for both StopReason and finish_reason to help you get started.

@kevinaboos kevinaboos added enhancement New feature or request good first issue Good for newcomers help wanted Looking for help from anyone! labels Aug 6, 2024
@kevinaboos
Copy link
Contributor Author

Here's an example illustrating the need for this feature. In the screenshot below, the model has actually finished sending its response message, but it doesn't seem like it's finished (based on the strange content where it just kind of ... trails off without finishing the sentence).

This is confusing because there's no definitive way for the user to know whether the model is done responding, or if it's just taking a long time to calculate the next words.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: frontend enhancement New feature or request good first issue Good for newcomers help wanted Looking for help from anyone!
Projects
None yet
Development

No branches or pull requests

2 participants