Separate out `model` and `base_url` in `InferenceClient` #2293

anubhavrana · 2024-05-29T22:22:49Z

Is your feature request related to a problem? Please describe.

When using the InferenceClient we can use the model argument to specify a model id on the HuggingFace Hub or a URL to a custom deployed Inference Endpoint. This generally works well but we have an API gateway sitting in front of our text-generation-inference servers and the gateway has additional plugins and routing based on the model_id.

Our plugins are currently only exposed for the openai v1/chat/completions route and this issue is not present when using the native text-generation-inference routes, however the change may be good to have for both.

The way our plugin works is that we provide a base API URL and based on the model_id in the chat_completions method we route requests to the relevant model endpoint.

When trying to use the InferenceClient with our gateway we run into 404 responses due to gateway not being able to find the routes (since model was set to "tgi")

Describe the solution you'd like

Separation of model and base_url will help solve the issue. Furthermore, if we just use base_url when instantiating InferenceClient and use model in the different methods (like chat_completions) we can also follow a similar pattern to the openai client. This should also not affect or break the current usage pattern and we can have default values similarly to the current implementation.

Describe alternatives you've considered

We are able to circumvent this issue when using the native text-generation-inference routes by specifying headers in InferenceClient which works since our plugin is not enabled on those routes, but for routes that have the plugin, adding the appropriate headers still fails since the plugin injects the provided model_id (in the current case a random string "tgi") and overrides the provided header.

Additional context

I'd be happy to create a PR for this. I've started working on a possible fix and would love to collaborate!

The text was updated successfully, but these errors were encountered:

Wauplin · 2024-05-30T12:42:45Z

Hi @anubhavrana, so if I understand correctly, you would like to provide a custom model_id specifically for the chat_completion method to override the dummy value sent to the server? To do that, I don't think we should use the model parameter from both InferenceClient and chat_completion to provide the url+the model id, as it would be misleading. I would only add an optional model_id: Optional[str] = None parameter to chat_completion and use it.

However, before moving forward on this I'd like to understand what's your issue with using headers to perform your goal using the current implementation? Your router/gateway should be able to read the request headers to route the request. For example you can do InferenceClient(model="TGI URL", headers={"X-Model-Id": <your_model_id>}).chat_completion(...). Is there a problem with this solution? I found it unusual to rely the body payload to route requests.

anubhavrana · 2024-05-31T03:23:37Z

Hi @anubhavrana, so if I understand correctly, you would like to provide a custom model_id specifically for the chat_completion method to override the dummy value sent to the server? To do that, I don't think we should use the model parameter from both InferenceClient and chat_completion to provide the url+the model id, as it would be misleading. I would only add an optional model_id: Optional[str] = None parameter to chat_completion and use it.

Yes that's exactly it, and that makes sense and agreed that it shouldn't be added to the InferenceClient.

However, before moving forward on this I'd like to understand what's your issue with using headers to perform your goal using the current implementation? Your router/gateway should be able to read the request headers to route the request. For example you can do InferenceClient(model="TGI URL", headers={"X-Model-Id": <your_model_id>}).chat_completion(...). Is there a problem with this solution? I found it unusual to rely the body payload to route requests.

Yes our gateway reads the request headers and uses that to route it's requests. What we have done with the gateway is to separate out the tgi and openai routes. For the tgi routes doing what you proposed is how we make requests to the gateway and it works well. When the gateway receives requests for the openai routes, a custom plugin retrieves the model parameter from the request body and adds that to the request as a header in the same way as the tgi routes.

Our users use the openai routes with the openai client, where when the chat.completions.create method is called the model argument is provided and then used by the gateway plugin to add the headers. I would like to be able to do the same thing with the InferenceClient using the openai routes to provide that flexibility as well.

I hope that helped clarify the problem. Open to other solutions too.

Wauplin · 2024-05-31T08:14:35Z

Ok, makes sense. I opened #2302 to support this :)

anubhavrana · 2024-05-31T17:41:54Z

Thank you so much! @Wauplin

Wauplin · 2024-06-10T09:46:26Z

Closed by #2302. Will be available in next release :)

anubhavrana mentioned this issue May 29, 2024

Expose model argument in python clients huggingface/text-generation-inference#1978

Closed

Wauplin mentioned this issue May 31, 2024

Send model_id in ChatCompletion request #2302

Merged

Wauplin closed this as completed Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate out `model` and `base_url` in `InferenceClient` #2293

Separate out `model` and `base_url` in `InferenceClient` #2293

anubhavrana commented May 29, 2024 •

edited

Loading

Wauplin commented May 30, 2024

anubhavrana commented May 31, 2024 •

edited

Loading

Wauplin commented May 31, 2024

anubhavrana commented May 31, 2024

Wauplin commented Jun 10, 2024

Separate out model and base_url in InferenceClient #2293

Separate out model and base_url in InferenceClient #2293

Comments

anubhavrana commented May 29, 2024 • edited Loading

Wauplin commented May 30, 2024

anubhavrana commented May 31, 2024 • edited Loading

Wauplin commented May 31, 2024

anubhavrana commented May 31, 2024

Wauplin commented Jun 10, 2024

Separate out `model` and `base_url` in `InferenceClient` #2293

Separate out `model` and `base_url` in `InferenceClient` #2293

anubhavrana commented May 29, 2024 •

edited

Loading

anubhavrana commented May 31, 2024 •

edited

Loading