docs: Clarify the endpoint that webui uses #17001

openingnow · 2025-11-04T15:02:41Z

server/README says webui uses /chat/completions however we currently have /completion, /v1/completions, and /v1/chat/completions.
Actually, there is /chat/completion but it is not documented and misleading.

llama.cpp/tools/server/server.cpp

Lines 5629 to 5632 in 5e90233

    
           svr->Post(params.api_prefix + "/completions",         handle_completions); 
        
           svr->Post(params.api_prefix + "/v1/completions",      handle_completions_oai); 
        
           svr->Post(params.api_prefix + "/chat/completions",    handle_chat_completions); 
        
           svr->Post(params.api_prefix + "/v1/chat/completions", handle_chat_completions);

This PR fixes readme to show exact endpoint(which is /v1/chat/completions).

llama.cpp/tools/server/webui/src/lib/services/chat.ts

Line 179 in cc98f8d

const response = await fetch(`./v1/chat/completions`, {

* origin/master: (21 commits) vulkan: Fix GGML_VULKAN_CHECK_RESULTS to better handle fusion (ggml-org#16919) examples(gguf): GGUF example outputs (ggml-org#17025) mtmd: allow QwenVL to process larger image by default (ggml-org#17020) server : do not default to multiple slots with speculative decoding (ggml-org#17017) mtmd: improve struct initialization (ggml-org#16981) docs: Clarify the endpoint that webui uses (ggml-org#17001) model : add openPangu-Embedded (ggml-org#16941) ggml webgpu: minor set rows optimization (ggml-org#16810) sync : ggml ggml : fix conv2d_dw SVE path (ggml/1380) CUDA: update ops.md (ggml-org#17005) opencl: update doc (ggml-org#17011) refactor: replace sprintf with snprintf for safer string handling in dump functions (ggml-org#16913) vulkan: remove the need for the dryrun (ggml-org#16826) server : do context shift only while generating (ggml-org#17000) readme : update hot topics (ggml-org#17002) ggml-cpu : bicubic interpolation (ggml-org#16891) ci : apply model label to models (ggml-org#16994) chore : fix models indent after refactor (ggml-org#16992) Fix garbled output with REPACK at high thread counts (ggml-org#16956) ...

Clarify the endpoint that webui uses

70576f8

openingnow requested review from ggerganov and ngxson as code owners November 4, 2025 15:02

github-actions bot added examples server labels Nov 4, 2025

ngxson approved these changes Nov 5, 2025

View reviewed changes

ngxson merged commit fd2f84f into ggml-org:master Nov 5, 2025
1 check passed

ngxson changed the title ~~Clarify the endpoint that webui uses~~ docs: Clarify the endpoint that webui uses Nov 5, 2025

openingnow deleted the patch-1 branch November 5, 2025 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Clarify the endpoint that webui uses #17001

docs: Clarify the endpoint that webui uses #17001

Uh oh!

openingnow commented Nov 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	svr->Post(params.api_prefix + "/completions", handle_completions);
	svr->Post(params.api_prefix + "/v1/completions", handle_completions_oai);
	svr->Post(params.api_prefix + "/chat/completions", handle_chat_completions);
	svr->Post(params.api_prefix + "/v1/chat/completions", handle_chat_completions);

docs: Clarify the endpoint that webui uses #17001

docs: Clarify the endpoint that webui uses #17001

Uh oh!

Conversation

openingnow commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

openingnow commented Nov 4, 2025 •

edited

Loading