Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate out model and base_url in InferenceClient #2293

Closed
anubhavrana opened this issue May 29, 2024 · 5 comments
Closed

Separate out model and base_url in InferenceClient #2293

anubhavrana opened this issue May 29, 2024 · 5 comments

Comments

@anubhavrana
Copy link

anubhavrana commented May 29, 2024

Is your feature request related to a problem? Please describe.

When using the InferenceClient we can use the model argument to specify a model id on the HuggingFace Hub or a URL to a custom deployed Inference Endpoint. This generally works well but we have an API gateway sitting in front of our text-generation-inference servers and the gateway has additional plugins and routing based on the model_id.

Our plugins are currently only exposed for the openai v1/chat/completions route and this issue is not present when using the native text-generation-inference routes, however the change may be good to have for both.

The way our plugin works is that we provide a base API URL and based on the model_id in the chat_completions method we route requests to the relevant model endpoint.

When trying to use the InferenceClient with our gateway we run into 404 responses due to gateway not being able to find the routes (since model was set to "tgi")

Describe the solution you'd like

Separation of model and base_url will help solve the issue. Furthermore, if we just use base_url when instantiating InferenceClient and use model in the different methods (like chat_completions) we can also follow a similar pattern to the openai client. This should also not affect or break the current usage pattern and we can have default values similarly to the current implementation.

Describe alternatives you've considered

We are able to circumvent this issue when using the native text-generation-inference routes by specifying headers in InferenceClient which works since our plugin is not enabled on those routes, but for routes that have the plugin, adding the appropriate headers still fails since the plugin injects the provided model_id (in the current case a random string "tgi") and overrides the provided header.

Additional context

I'd be happy to create a PR for this. I've started working on a possible fix and would love to collaborate!

@Wauplin
Copy link
Contributor

Wauplin commented May 30, 2024

Hi @anubhavrana, so if I understand correctly, you would like to provide a custom model_id specifically for the chat_completion method to override the dummy value sent to the server? To do that, I don't think we should use the model parameter from both InferenceClient and chat_completion to provide the url+the model id, as it would be misleading. I would only add an optional model_id: Optional[str] = None parameter to chat_completion and use it.

However, before moving forward on this I'd like to understand what's your issue with using headers to perform your goal using the current implementation? Your router/gateway should be able to read the request headers to route the request. For example you can do InferenceClient(model="TGI URL", headers={"X-Model-Id": <your_model_id>}).chat_completion(...). Is there a problem with this solution? I found it unusual to rely the body payload to route requests.

@anubhavrana
Copy link
Author

anubhavrana commented May 31, 2024

Hi @anubhavrana, so if I understand correctly, you would like to provide a custom model_id specifically for the chat_completion method to override the dummy value sent to the server? To do that, I don't think we should use the model parameter from both InferenceClient and chat_completion to provide the url+the model id, as it would be misleading. I would only add an optional model_id: Optional[str] = None parameter to chat_completion and use it.

Yes that's exactly it, and that makes sense and agreed that it shouldn't be added to the InferenceClient.

However, before moving forward on this I'd like to understand what's your issue with using headers to perform your goal using the current implementation? Your router/gateway should be able to read the request headers to route the request. For example you can do InferenceClient(model="TGI URL", headers={"X-Model-Id": <your_model_id>}).chat_completion(...). Is there a problem with this solution? I found it unusual to rely the body payload to route requests.

Yes our gateway reads the request headers and uses that to route it's requests. What we have done with the gateway is to separate out the tgi and openai routes. For the tgi routes doing what you proposed is how we make requests to the gateway and it works well. When the gateway receives requests for the openai routes, a custom plugin retrieves the model parameter from the request body and adds that to the request as a header in the same way as the tgi routes.

Our users use the openai routes with the openai client, where when the chat.completions.create method is called the model argument is provided and then used by the gateway plugin to add the headers. I would like to be able to do the same thing with the InferenceClient using the openai routes to provide that flexibility as well.

I hope that helped clarify the problem. Open to other solutions too.

@Wauplin
Copy link
Contributor

Wauplin commented May 31, 2024

Ok, makes sense. I opened #2302 to support this :)

@anubhavrana
Copy link
Author

Thank you so much! @Wauplin

@Wauplin
Copy link
Contributor

Wauplin commented Jun 10, 2024

Closed by #2302. Will be available in next release :)

@Wauplin Wauplin closed this as completed Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants