Bug: Last 2 Chunks In Streaming Mode Come Together In Firefox #9502

CentricStorm · 2024-09-16T02:14:04Z

What happened?

When using /completion with stream: true, the last 2 JSON chunks come together in Firefox, but Chrome seems to handle it fine, so it might be a Firefox bug.

Looking further into this, it seems like HTTP Transfer-Encoding: chunked requires each chunk to be terminated with \r\n, but here \n\n is used instead:

llama.cpp/examples/server/utils.hpp

Lines 296 to 299 in 6262d13

    
           const std::string str = 
        
               std::string(event) + ": " + 
        
               data.dump(-1, ' ', false, json::error_handler_t::replace) + 
        
               "\n\n"; // note: these newlines are important (not sure why though, if you know, add a comment to explain)

This doesn't seem to be just a Windows requirement, but listed as part of the HTTP specification:
HTTP Chunked Transfer Coding

More information, including an example chunked response:
Transfer-Encoding Directives

Name and Version

llama-server.exe
version: 3761 (6262d13)
built with MSVC 19.29.30154.0 for x64

What operating system are you seeing the problem on?

Windows

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

CentricStorm · 2024-09-16T16:15:02Z

This was roughly how the HTTP API was working before (which still works in Chrome):

const response = await fetch("http://localhost/completion", {
	method: "POST",
	body: JSON.stringify({
		prompt,
		n_predict: 32,
		stream: true
	})
})
for await (const chunk of response.body.pipeThrough(new TextDecoderStream("utf-8"))) {
	if (chunk.startsWith("error")) {
		return
	}
	const data = JSON.parse(chunk.substring(6))
}

The documentation doesn't mention if this is the intended way to use streaming mode.

ggerganov · 2024-09-17T06:22:41Z

Btw, we now also add [DONE]\n\n at the end of the response: #9459

(Not sure if this is relevant, as I have little knowledge about how the HTTP stuff should work.)

CentricStorm · 2024-09-17T06:48:47Z

Btw, we now also add [DONE]\n\n at the end of the response: #9459

I think that's only for the OpenAI-compatible API /chat/completions, not for llama-server's own API /completion.

More research shows that llama-server is currently responding to stream requests using a format closely resembling server-sent events (one difference is that llama-server can send messages with an error field, even though that is non-standard).

This seems strange at first because server-sent events are intended to be used client-side with the EventSource interface...but that doesn't support HTTP POST requests (which llama-server requires).

Using fetch instead to access these server-sent events is probably non-standard, and is most likely the reason why the behavior is different in Firefox and Chrome. In other words, it may not be a bug at all.

Regardless, more information can be added to the documentation, including an example script that manually splits the chunks and works in Firefox as well as Node in #9519.

CentricStorm added bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Sep 16, 2024

CentricStorm closed this as completed Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Last 2 Chunks In Streaming Mode Come Together In Firefox #9502

Bug: Last 2 Chunks In Streaming Mode Come Together In Firefox #9502

CentricStorm commented Sep 16, 2024

CentricStorm commented Sep 16, 2024

ggerganov commented Sep 17, 2024

CentricStorm commented Sep 17, 2024

Bug: Last 2 Chunks In Streaming Mode Come Together In Firefox #9502

Bug: Last 2 Chunks In Streaming Mode Come Together In Firefox #9502

Comments

CentricStorm commented Sep 16, 2024

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

CentricStorm commented Sep 16, 2024

ggerganov commented Sep 17, 2024

CentricStorm commented Sep 17, 2024