Display spinning/loading animation while an LLM response message is still streaming #199

kevinaboos · 2024-08-06T21:48:34Z

Problem

Currently, the UI makes it impossible to tell if a model has finished streaming its response back to the user, or if it is still underway and is just taking a long time to calculate the response. This is especially prevalent on slower machines without AVX-512 or CUDA, and even more of a problem when the message being returned by the model is very very long.

Proposed fix

To improve the user experience, a small spinning loading wheel (or some other kind of animation) should be added either beneath or next to the latest message in the chat view, which will spin or animate in some way that makes it obvious that the model is still generating a streaming a response. Then, once the message streaming has completed, the spinning animation would end and set itself to non-visible.

Additional improvements

Alternatively, that spinning animation could change to a tiny checkmark icon or something similar, in order to indicate that the message had been fully streamed from the model.

In addition, this small loading animation could also be displayed while a model is being loaded in the background, which again can take quite a long time (20+ seconds) on slower machines. This would help inform the user that something is indeed still happening in the background, rather than leaving the user confused while the UI does nothing for dozens of seconds.

Implementation starter ideas

The wasmedge frontend/backend does make it obvious when a message has completed being streamed. You can observe this while looking at the console log:

[2024-08-06T21:56:32Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:33Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:33Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:33Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:34Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:34Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:35Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:35Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:35Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:36Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:36Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06T21:56:36Z INFO  llama-core] Get output buffer generated by the model named Meta-Llama-3-8B-Instruct-f16.gguf in the stream mode.
[2024-08-06 14:56:36.967] [info] [WASI-NN] GGML backend: EOS token found

The last statement indicates that the stream has ended. Sometimes you may also observer other forms of stream ending, most of which are currently handled at the Rust level via the StopReason enum. Search for both StopReason and finish_reason to help you get started.

The text was updated successfully, but these errors were encountered:

kevinaboos · 2024-08-07T22:04:44Z

Here's an example illustrating the need for this feature. In the screenshot below, the model has actually finished sending its response message, but it doesn't seem like it's finished (based on the strange content where it just kind of ... trails off without finishing the sentence).

This is confusing because there's no definitive way for the user to know whether the model is done responding, or if it's just taking a long time to calculate the next words.

kevinaboos added enhancement New feature or request good first issue Good for newcomers help wanted Looking for help from anyone! labels Aug 6, 2024

joulei mentioned this issue Sep 19, 2024

Improve how text streaming is displayed #255

Closed

joulei added the area: frontend label Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Display spinning/loading animation while an LLM response message is still streaming #199

Display spinning/loading animation while an LLM response message is still streaming #199

kevinaboos commented Aug 6, 2024 •

edited

Loading

kevinaboos commented Aug 7, 2024

Display spinning/loading animation while an LLM response message is still streaming #199

Display spinning/loading animation while an LLM response message is still streaming #199

Comments

kevinaboos commented Aug 6, 2024 • edited Loading

Problem

Proposed fix

Additional improvements

Implementation starter ideas

kevinaboos commented Aug 7, 2024

kevinaboos commented Aug 6, 2024 •

edited

Loading