Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix resolve chat completion URL #2540

Merged
merged 2 commits into from
Sep 16, 2024
Merged

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Sep 13, 2024

Fix #2484.

With the introduction of base_url in #2384 and a refacto in #2410, some bugs have been introduced on the logic to resolve the chat completion URL. This PR fixes definitely this with a dedicated _resolve_chat_completion_url method. I added a bunch of parametrized tests to check every possible cases. It is now possible to:

  • pass a model id as InferenceClient(model=...)
  • pass a model id as InferenceClient().chat_completion(model=...)
  • pass a url as InferenceClient(model=...), InferenceClient(base_url=...) or InferenceClient().chat_completion(model=...)
    • the URL can have a trailing / or not
    • the URL can end with /v1 or not (for OpenAI compatibility)
    • the URL can end with /v1/chat/completions or not
    • the URL can be a local TGI instance or an Inference Endpoint url
  • pass a url to InferenceClient and a model_id in chat_completion (already the case before)

With all of these use cases tested and working, I think this is fixed for good :)


Two minor unrelated changes:

  1. some typing: ignore to fix CI (after Implemented https://github.com/huggingface/huggingface_hub/issues/2516 #2532)
  2. Better error in src/huggingface_hub/utils/_http.py (after oversight in Refacto error parsing (HfHubHttpError) #2474)

Failing test is unrelated.

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 13, 2024

@MoritzLaurer FYI, I've made a full example for Inference Endpoints => InferenceClient and it works well:

from huggingface_hub import get_inference_endpoint


endpoint = get_inference_endpoint("smollm-360m-instruct-hgl")  # need to start an IE and copy its name
client = endpoint.resume().wait().client


output = client.chat_completion(
    [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ]
)

print(output.choices[0].message.content)

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! thanks @Wauplin! Nice test coverage 😄

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 16, 2024

Thanks for the review! Failing tests are unrelated so I'll merge this one :)

@Wauplin Wauplin merged commit a49ca75 into main Sep 16, 2024
15 of 19 checks passed
@Wauplin Wauplin deleted the 2484-fix-passing-chat-completion-url branch September 16, 2024 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Passing a HF endpoint URL to client.chat_completion() doesn't seem to work anymore
2 participants