Expose `model` argument in python clients #1978

anubhavrana · 2024-05-29T22:43:53Z

Feature request

Exposing the model argument in the chat and completions method (which could even be extended to the other methods like generate) will help solve the issue below. Furthermore, if we expose the model argument in the different methods (like chat and completions) we can also follow a similar pattern to the openai client. This should also not affect or break the current usage pattern and we can have default values similarly to the current implementation.

Motivation

When using the python clients (Client and AsyncClient) we can specify the base_url to a custom deployed Inference Endpoint. This generally works well but we have an API gateway sitting in front of our text-generation-inference servers and the gateway has additional plugins and routing based on the model_id, which currently gets set to "tgi" when using chat or completion methods.

Our plugins are currently only exposed for the openai v1/chat/completions route and this issue is not present when using the native text-generation-inference routes. We are able to circumvent this issue when using the native text-generation-inference routes by specifying headers in the client which works since our plugin is not enabled on those routes, but for routes that have the plugin, adding the appropriate headers still fails since the plugin injects the provided model_id (in the current case a random string "tgi") and overrides the provided header.

The way our plugin works is that we provide a base API URL and based on the model_id in the chat or completions method we route requests to the relevant model endpoint.

When trying to use the InferenceClient with our gateway we run into 404 responses due to the gateway not being able to find the routes (since model was set to "tgi")

Your contribution

I'd be happy to create a PR for this. I've started working on a possible fix and would love to collaborate.

Although I do know that Client and AsyncClient will be deprecated and the suggestion is to use InferenceClient from huggingface_hub. I have created a similar issue there (see huggingface/huggingface_hub#2293) and completely understand if this feature is not needed given the deprecation.

The text was updated successfully, but these errors were encountered:

Wauplin · 2024-05-30T12:45:29Z

Hi @anubhavrana, better to continue the discussion in huggingface/huggingface_hub#2293 indeed :)

Wauplin closed this as not planned Won't fix, can't repro, duplicate, stale May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose `model` argument in python clients #1978

Expose `model` argument in python clients #1978

anubhavrana commented May 29, 2024

Wauplin commented May 30, 2024

Expose model argument in python clients #1978

Expose model argument in python clients #1978

Comments

anubhavrana commented May 29, 2024

Feature request

Motivation

Your contribution

Wauplin commented May 30, 2024

Expose `model` argument in python clients #1978

Expose `model` argument in python clients #1978