Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "/chat/completions" as alias for "/v1/chat/completions" #5722

Merged
merged 4 commits into from
Feb 28, 2024

Conversation

jorgealias
Copy link
Contributor

From server : improvements and maintenance #4216

Regarding:

Use multiple mount points for the OAI API

// TODO: add mount point without "/v1" prefix -- how?
svr.Post("/v1/chat/completions", [&llama](const httplib::Request &req, httplib::Response &res)

Reusing the same "completions" lambda for the mount point without "/v1" prefix.

Added a new "concurrent OAI completions requests no v1" testing step to validate it is working for the new mount point.

The change only moved the original lambda two indents left, to match other "auto" formatting. Changes are seen better when ignoring the white space differences: view ignoring ws

If other similar changes are necessary, for more mount points, then some work can be done to keep these changes more concise.

Copy link
Collaborator

@phymbert phymbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests. I would suggest to better introduce an --api-prefix option which will allow to prefix all endpoints according to what expects the client. This fix does not address /v1/embeddings.

@phymbert
Copy link
Collaborator

phymbert commented Feb 26, 2024

@ngxson What is your point of view on the proposed approach ?

@ngxson
Copy link
Collaborator

ngxson commented Feb 26, 2024

Thanks for the tests. I would suggest to better introduce an --api-prefix option which will allow to prefix all endpoints according to what expects the client. This fix does not address /v1/embeddings.

Yeah maybe a good idea, but I'm not sure what are possible values for --api-prefix (I mean it's obvious that we can use whatever prefix we want, but as a normal user, what are possible use cases?)

If we default --api-prefix to /v1, then what will happen when OpenAI release their new /v2? In that case, changing default value of --api-prefix may introduce breaking changes.

@phymbert
Copy link
Collaborator

phymbert commented Feb 26, 2024

Thanks for the tests. I would suggest to better introduce an --api-prefix option which will allow to prefix all endpoints according to what expects the client. This fix does not address /v1/embeddings.

Yeah maybe a good idea, but I'm not sure what are possible values for --api-prefix (I mean it's obvious that we can use whatever prefix we want, but as a normal user, what are possible use cases?)

If we default --api-prefix to /v1, then what will happen when OpenAI release their new /v2? In that case, changing default value of --api-prefix may introduce breaking changes.

Good point, what about --api-prefix-v1 ? :/ IMHO this should not be done at the server level but with some proxy rewrite rules.

@jorgealias what are the clients which expects /chat/completions ? All clients I have seen can have a base_url.

For example, you can set openai.base_url='http://localhost:8080/v1/chat .

@ggerganov What was the motivation to have different prefixes ? URI versionning is the common REST standard.

@ngxson
Copy link
Collaborator

ngxson commented Feb 26, 2024

Instead of --api-prefix, I would propose --openai-version "v1,v2" (default to v1) and the / will be pointed to the highest one in the list.

Still, it's a future-proof thing that I'm not even sure if that's a good idea or not. Maybe we should wait until OAI release v2 API?

@jorgealias
Copy link
Contributor Author

@jorgealias what are the clients which expects /chat/completions ?

All clients I have seen can have a base_url.
For example, you can set openai.base_url='http://localhost:8080/v1/chat' .

The change is only providing a simple implementation for the comment at line 2683:

// TODO: add mount point without "/v1" prefix -- how?

Attempting to answer the question how.

Using openai.base_url='http://localhost:8080/v1/' could be indeed a valid option when all APIs have the same version.

More can be done, to remap/reuse existing handlers, and support multiple or different versions, if such a reuse is doable and necessary, for various clients. That could be also applicable for other handlers, like /v1/embeddings. I am not aware of such other clients but my knowledge is limited.

Changing the version usually implies more than just a simple reuse. It might not be safe to make future assumptions and come with options on how to handle such changes, for matching other APIs. Indeed, it is better to wait.

Regarding this specific change, it might have been faster to provide a message showing this implementation or to ask why would that be necessary, but here we are. If there is no immediate need, please press the close button :)

Why this? Just trying to help, starting somewhere. One way to say thank you for the great work and effort to build this project.

@phymbert
Copy link
Collaborator

thanks, @jorgealias : contributions are very welcomed. Maybe we can switch this one to draft at the moment the time to sort out a good approach to support versionning.

Since you provided a scenario on the test framework, it would be nice if you could test missing features in the server. We are planning to refractor the code, and having good coverage will definitely help. Typically, everything around multimodal.

@ggerganov
Copy link
Owner

@ggerganov What was the motivation to have different prefixes ? URI versionning is the common REST standard.

It was a feature request in one of the discussions: #4160 (comment)

@ngxson
Copy link
Collaborator

ngxson commented Feb 27, 2024

@jorgealias You are right about the point of fixing TODO: add mount point without "/v1" prefix -- how?. In fact, I think what's being asked in the TODO is "how to mount one handler to multiple routes?", and your PR did addresses the issue by using lambda function.

So I think it's good to merge this one.

@phymbert Sorry I misunderstood your initial question. In fact, the /v1 that we have is not to have versioning, but because most codes that you can find on the internet that uses OpenAI API points to /v1/chat/completions. This endpoint is actually shown in OpenAI docs:

image

So that's why I mentioned that we should not be future-proof, because the /v1 part in llama.cpp is solely to be compatible with codes already using OpenAI API, and no one will know when they release the v2 of their API.

For the same purpose, the other endpoints supported by OpenAI like /embeddings, /completions or /models should be able to access with and without the /v1 part. However, since these changes requires some modifications in the code that parse json, we can do that in another PR.

For now, This PR seems OK for me to be merged.

Copy link
Collaborator

@phymbert phymbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thanks for the explanation.

@jorgealias
Copy link
Contributor Author

I am pushing now, but @phymbert did ask

This fix does not address /v1/embeddings

I am not sure what needs to be addressed but if it is about an extra /v1, I did mention that things can be done a bit differently to support more similar changes easily.

Adding a helper function like:

typedef httplib::Server& (httplib::Server::*ServerMemberFn)
    (const std::string &, const httplib::Server::Handler);

static void server_handler(httplib::Server &svr, ServerMemberFn function,
    const std::vector<std::string> &patterns, const httplib::Server::Handler handler
) {
    for (const auto& pattern : patterns)
    {
        (svr.*function)(pattern, handler);
    }
}

will help eliminating the need for extra named lambdas

If only one path for a POST, then use the standard:

    svr.Post("/infill", [&llama, &validate_api_key](const httplib::Request &req, httplib::Response &res)
         // { lambda block here }
    );

For multiple paths for a POST with the same lambda, then use the new helper:

    server_handler(svr, &httplib::Server::Post, { "/chat/completions", "/v1/chat/completions" },
        [&llama, &validate_api_key, &sparams](const httplib::Request &req, httplib::Response &res)
         // { lambda block here }
    );

or

    server_handler(svr, &httplib::Server::Post, { "/embeddings", "/v1/embeddings" },
        [&llama](const httplib::Request &req, httplib::Response &res)
         // { lambda block here }
    );

As in, only replacing:

  • svr.Post("/infill"
    with
  • server_handler(svr, &httplib::Server::Post, { "/embeddings", "/v1/embeddings" }

And that should also work for srv.Get(

@ggerganov
Copy link
Owner

Adding a helper function like:

Yup, good idea

@ggerganov ggerganov merged commit efc7225 into ggerganov:master Feb 28, 2024
57 of 58 checks passed
@ngxson ngxson mentioned this pull request Feb 28, 2024
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* Add "/chat/completions" as alias for "/v1/chat/completions"

* merge to upstream master

* minor : fix trailing whitespace

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* Add "/chat/completions" as alias for "/v1/chat/completions"

* merge to upstream master

* minor : fix trailing whitespace

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants