-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate out model
and base_url
in InferenceClient
#2293
Comments
Hi @anubhavrana, so if I understand correctly, you would like to provide a custom However, before moving forward on this I'd like to understand what's your issue with using |
Yes that's exactly it, and that makes sense and agreed that it shouldn't be added to the
Yes our gateway reads the request headers and uses that to route it's requests. What we have done with the gateway is to separate out the Our users use the I hope that helped clarify the problem. Open to other solutions too. |
Ok, makes sense. I opened #2302 to support this :) |
Thank you so much! @Wauplin |
Closed by #2302. Will be available in next release :) |
Is your feature request related to a problem? Please describe.
When using the
InferenceClient
we can use themodel
argument to specify a model id on the HuggingFace Hub or a URL to a custom deployed Inference Endpoint. This generally works well but we have an API gateway sitting in front of ourtext-generation-inference
servers and the gateway has additional plugins and routing based on themodel_id
.Our plugins are currently only exposed for the
openai
v1/chat/completions
route and this issue is not present when using the nativetext-generation-inference
routes, however the change may be good to have for both.The way our plugin works is that we provide a base API URL and based on the
model_id
in thechat_completions
method we route requests to the relevant model endpoint.When trying to use the
InferenceClient
with our gateway we run into404
responses due to gateway not being able to find the routes (sincemodel
was set to"tgi"
)Describe the solution you'd like
Separation of
model
andbase_url
will help solve the issue. Furthermore, if we just usebase_url
when instantiatingInferenceClient
and usemodel
in the different methods (likechat_completions
) we can also follow a similar pattern to theopenai
client. This should also not affect or break the current usage pattern and we can have default values similarly to the current implementation.Describe alternatives you've considered
We are able to circumvent this issue when using the native
text-generation-inference
routes by specifyingheaders
inInferenceClient
which works since our plugin is not enabled on those routes, but for routes that have the plugin, adding the appropriateheaders
still fails since the plugin injects the providedmodel_id
(in the current case a random string"tgi"
) and overrides the provided header.Additional context
I'd be happy to create a PR for this. I've started working on a possible fix and would love to collaborate!
The text was updated successfully, but these errors were encountered: