You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exposing the model argument in the chat and completions method (which could even be extended to the other methods like generate) will help solve the issue below. Furthermore, if we expose the model argument in the different methods (like chat and completions) we can also follow a similar pattern to the openai client. This should also not affect or break the current usage pattern and we can have default values similarly to the current implementation.
Motivation
When using the python clients (Client and AsyncClient) we can specify the base_url to a custom deployed Inference Endpoint. This generally works well but we have an API gateway sitting in front of our text-generation-inference servers and the gateway has additional plugins and routing based on the model_id, which currently gets set to "tgi" when using chat or completion methods.
Our plugins are currently only exposed for the openaiv1/chat/completions route and this issue is not present when using the native text-generation-inference routes. We are able to circumvent this issue when using the native text-generation-inference routes by specifying headers in the client which works since our plugin is not enabled on those routes, but for routes that have the plugin, adding the appropriate headers still fails since the plugin injects the provided model_id (in the current case a random string "tgi") and overrides the provided header.
The way our plugin works is that we provide a base API URL and based on the model_id in the chat or completions method we route requests to the relevant model endpoint.
When trying to use the InferenceClient with our gateway we run into 404 responses due to the gateway not being able to find the routes (since model was set to "tgi")
Your contribution
I'd be happy to create a PR for this. I've started working on a possible fix and would love to collaborate.
Although I do know that Client and AsyncClient will be deprecated and the suggestion is to use InferenceClient from huggingface_hub. I have created a similar issue there (see huggingface/huggingface_hub#2293) and completely understand if this feature is not needed given the deprecation.
The text was updated successfully, but these errors were encountered:
Feature request
Exposing the
model
argument in thechat
andcompletions
method (which could even be extended to the other methods likegenerate
) will help solve the issue below. Furthermore, if we expose the model argument in the different methods (likechat
andcompletions
) we can also follow a similar pattern to theopenai
client. This should also not affect or break the current usage pattern and we can have default values similarly to the current implementation.Motivation
When using the python clients (
Client
andAsyncClient
) we can specify thebase_url
to a custom deployed Inference Endpoint. This generally works well but we have an API gateway sitting in front of ourtext-generation-inference
servers and the gateway has additional plugins and routing based on themodel_id
, which currently gets set to"tgi"
when usingchat
orcompletion
methods.Our plugins are currently only exposed for the
openai
v1/chat/completions
route and this issue is not present when using the nativetext-generation-inference
routes. We are able to circumvent this issue when using the nativetext-generation-inference
routes by specifying headers in theclient
which works since our plugin is not enabled on those routes, but for routes that have the plugin, adding the appropriateheaders
still fails since the plugin injects the providedmodel_id
(in the current case a random string"tgi"
) and overrides the provided header.The way our plugin works is that we provide a base API URL and based on the
model_id
in thechat
orcompletions
method we route requests to the relevant model endpoint.When trying to use the InferenceClient with our gateway we run into
404
responses due to the gateway not being able to find the routes (since model was set to"tgi"
)Your contribution
I'd be happy to create a PR for this. I've started working on a possible fix and would love to collaborate.
Although I do know that
Client
andAsyncClient
will be deprecated and the suggestion is to useInferenceClient
fromhuggingface_hub
. I have created a similar issue there (see huggingface/huggingface_hub#2293) and completely understand if this feature is not needed given the deprecation.The text was updated successfully, but these errors were encountered: