Skip to content

Conversation

@openingnow
Copy link
Contributor

@openingnow openingnow commented Nov 4, 2025

server/README says webui uses /chat/completions however we currently have /completion, /v1/completions, and /v1/chat/completions.
Actually, there is /chat/completion but it is not documented and misleading.

svr->Post(params.api_prefix + "/completions", handle_completions);
svr->Post(params.api_prefix + "/v1/completions", handle_completions_oai);
svr->Post(params.api_prefix + "/chat/completions", handle_chat_completions);
svr->Post(params.api_prefix + "/v1/chat/completions", handle_chat_completions);

This PR fixes readme to show exact endpoint(which is /v1/chat/completions).

const response = await fetch(`./v1/chat/completions`, {

@ngxson ngxson merged commit fd2f84f into ggml-org:master Nov 5, 2025
1 check passed
@ngxson ngxson changed the title Clarify the endpoint that webui uses docs: Clarify the endpoint that webui uses Nov 5, 2025
@openingnow openingnow deleted the patch-1 branch November 5, 2025 10:56
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Nov 5, 2025
* origin/master: (21 commits)
vulkan: Fix GGML_VULKAN_CHECK_RESULTS to better handle fusion (ggml-org#16919)
examples(gguf): GGUF example outputs (ggml-org#17025)
mtmd: allow QwenVL to process larger image by default (ggml-org#17020)
server : do not default to multiple slots with speculative decoding (ggml-org#17017)
mtmd: improve struct initialization (ggml-org#16981)
docs: Clarify the endpoint that webui uses (ggml-org#17001)
model : add openPangu-Embedded (ggml-org#16941)
ggml webgpu: minor set rows optimization (ggml-org#16810)
sync : ggml
ggml : fix conv2d_dw SVE path (ggml/1380)
CUDA: update ops.md (ggml-org#17005)
opencl: update doc (ggml-org#17011)
refactor: replace sprintf with snprintf for safer string handling in dump functions (ggml-org#16913)
vulkan: remove the need for the dryrun (ggml-org#16826)
server : do context shift only while generating (ggml-org#17000)
readme : update hot topics (ggml-org#17002)
ggml-cpu : bicubic interpolation (ggml-org#16891)
ci : apply model label to models (ggml-org#16994)
chore : fix models indent after refactor (ggml-org#16992)
Fix garbled output with REPACK at high thread counts (ggml-org#16956)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants