diff --git a/docs/deployment/integrations/production-stack.md b/docs/deployment/integrations/production-stack.md index fae392589c06..2f1894ccf002 100644 --- a/docs/deployment/integrations/production-stack.md +++ b/docs/deployment/integrations/production-stack.md @@ -55,7 +55,7 @@ sudo kubectl port-forward svc/vllm-router-service 30080:80 And then you can send out a query to the OpenAI-compatible API to check the available models: ```bash -curl -o- http://localhost:30080/models +curl -o- http://localhost:30080/v1/models ``` ??? console "Output" @@ -78,7 +78,7 @@ curl -o- http://localhost:30080/models To send an actual chatting request, you can issue a curl request to the OpenAI `/completion` endpoint: ```bash -curl -X POST http://localhost:30080/completions \ +curl -X POST http://localhost:30080/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "facebook/opt-125m",