Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose model argument in python clients #1978

Closed
anubhavrana opened this issue May 29, 2024 · 1 comment
Closed

Expose model argument in python clients #1978

anubhavrana opened this issue May 29, 2024 · 1 comment

Comments

@anubhavrana
Copy link

Feature request

Exposing the model argument in the chat and completions method (which could even be extended to the other methods like generate) will help solve the issue below. Furthermore, if we expose the model argument in the different methods (like chat and completions) we can also follow a similar pattern to the openai client. This should also not affect or break the current usage pattern and we can have default values similarly to the current implementation.

Motivation

When using the python clients (Client and AsyncClient) we can specify the base_url to a custom deployed Inference Endpoint. This generally works well but we have an API gateway sitting in front of our text-generation-inference servers and the gateway has additional plugins and routing based on the model_id, which currently gets set to "tgi" when using chat or completion methods.

Our plugins are currently only exposed for the openai v1/chat/completions route and this issue is not present when using the native text-generation-inference routes. We are able to circumvent this issue when using the native text-generation-inference routes by specifying headers in the client which works since our plugin is not enabled on those routes, but for routes that have the plugin, adding the appropriate headers still fails since the plugin injects the provided model_id (in the current case a random string "tgi") and overrides the provided header.

The way our plugin works is that we provide a base API URL and based on the model_id in the chat or completions method we route requests to the relevant model endpoint.

When trying to use the InferenceClient with our gateway we run into 404 responses due to the gateway not being able to find the routes (since model was set to "tgi")

Your contribution

I'd be happy to create a PR for this. I've started working on a possible fix and would love to collaborate.

Although I do know that Client and AsyncClient will be deprecated and the suggestion is to use InferenceClient from huggingface_hub. I have created a similar issue there (see huggingface/huggingface_hub#2293) and completely understand if this feature is not needed given the deprecation.

@Wauplin
Copy link
Contributor

Wauplin commented May 30, 2024

Hi @anubhavrana, better to continue the discussion in huggingface/huggingface_hub#2293 indeed :)

@Wauplin Wauplin closed this as not planned Won't fix, can't repro, duplicate, stale May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants