[hot-fix] Handle [DONE] signal from TGI + remove logic for "non-TGI servers" #2410

Wauplin · 2024-07-23T13:05:01Z

What's in this PR?:

Related to fix: append DONE message to chat stream text-generation-inference#2221. TGI now returns a b"data: [DONE]" stop signal when iterating through generated tokens (stream=True). This PR adds support for this stop signal. Once this PR is merged, I'll make a hot-fix release since InferenceClient is currently broken on newest versions of TGI for text_generation and chat_completion.
I also removed all the "non-tgi" logic in chat_completion since every model is served by TGI now, even when it's a transformers-only model (e.g. "microsoft/DialoGPT-small". This simplifies a lot the logic and will avoid hiding relevant errors to the users. In case the model is transformers-served and is not compatible with chat completion, a 422 unprocessable entity: template not found is returned.
I also took the opportunity to rename some private helpers for consistency.

Better to hide whitespace changes to review this PR.

HuggingFaceDocBuilderDev · 2024-07-23T13:08:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Narsil · 2024-07-23T13:10:03Z

src/huggingface_hub/inference/_client.py

@@ -860,10 +860,10 @@ def chat_completion(
                    stream=stream,
                )
            except HTTPError as e:
-                if e.response.status_code in (400, 404, 500):
+                if e.response.status_code in (400, 500):


There should really be only 1 error code here.
Can you check on a real deployment or in TGI source code what happens on non existing template models ?

I think it should be a 4xx since it's not really a server error. No idea for which one is more appropriate.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#client_error_responses

Can you check on a real deployment or in TGI source code what happens on non existing template models ?

It returns a

huggingface_hub.utils._errors.HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/gpt2/v1/chat/completions (Request ID: 3Y0mKxT7AmdMSZsfNjcQA) Template error: template not found

But in fact this is not even the problem. Since client-side rendering has been completely removed (in #2258), there is no point treating TGI-served and other models differently. I've updated the PR to simplify the logic which will finally avoid hiding errors to the user.

Narsil

Some nits.

src/huggingface_hub/inference/_common.py

tests/test_inference_client.py

src/huggingface_hub/inference/_common.py

LysandreJik

Thanks for the PR and swift fix @Wauplin

Wauplin · 2024-07-23T14:29:25Z

Sorry for the mess, I realized I should have done 2 PRs to fix these separately. @Narsil 's comment #2410 (comment) made me realize that we don't even need 2 different behaviors anymore in chat_completion (depending on TGI-served or not) since all model templates are rendered server-side anyway. This makes the implementation much cleaner.

LysandreJik

I didn't dive into the changes given the time sensitivity but from a quick look it looks good to me.

…ervers" (#2410) * Handle [DONE] signal from TGI * fix text_generation as well * Handle error 404 correctly * consistency + stop treating transformers-backed models differently * fix test * fix broken test on main * fix test * cleaner

Wauplin added 3 commits July 23, 2024 14:46

Handle [DONE] signal from TGI

31ec4cc

fix text_generation as well

bc63e01

Handle error 404 correctly

49189d9

Wauplin requested review from OlivierDehaene and LysandreJik July 23, 2024 13:05

Narsil reviewed Jul 23, 2024

View reviewed changes

src/huggingface_hub/inference/_common.py Outdated Show resolved Hide resolved

src/huggingface_hub/inference/_common.py Show resolved Hide resolved

tests/test_inference_client.py Outdated Show resolved Hide resolved

src/huggingface_hub/inference/_common.py Outdated Show resolved Hide resolved

LysandreJik approved these changes Jul 23, 2024

View reviewed changes

Wauplin added 5 commits July 23, 2024 15:51

consistency + stop treating transformers-backed models differently

8814011

fix test

d9d270f

fix broken test on main

7cefcf3

fix test

316bbe2

cleaner

81f63d4

Wauplin changed the title ~~[hot-fix] Handle [DONE] signal from TGI~~ [hot-fix] Handle [DONE] signal from TGI + remove logic for "non-TGI servers" Jul 23, 2024

Wauplin requested review from LysandreJik and Narsil July 23, 2024 14:23

LysandreJik approved these changes Jul 23, 2024

View reviewed changes

Wauplin merged commit 91fe78e into main Jul 23, 2024
17 checks passed

Wauplin deleted the hot-fix-inference-client-chat-completion branch July 23, 2024 14:37

Wauplin mentioned this pull request Jul 30, 2024

Document the difference between model and base_url #2431

Merged

Wauplin mentioned this pull request Sep 13, 2024

Fix resolve chat completion URL #2540

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hot-fix] Handle [DONE] signal from TGI + remove logic for "non-TGI servers" #2410

[hot-fix] Handle [DONE] signal from TGI + remove logic for "non-TGI servers" #2410

Wauplin commented Jul 23, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 23, 2024

Narsil Jul 23, 2024

Wauplin Jul 23, 2024

Narsil left a comment

LysandreJik left a comment

Wauplin commented Jul 23, 2024

LysandreJik left a comment

[hot-fix] Handle [DONE] signal from TGI + remove logic for "non-TGI servers" #2410

[hot-fix] Handle [DONE] signal from TGI + remove logic for "non-TGI servers" #2410

Conversation

Wauplin commented Jul 23, 2024 • edited Loading

What's in this PR?:

HuggingFaceDocBuilderDev commented Jul 23, 2024

Narsil Jul 23, 2024

Choose a reason for hiding this comment

Wauplin Jul 23, 2024

Choose a reason for hiding this comment

Narsil left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

Wauplin commented Jul 23, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

Wauplin commented Jul 23, 2024 •

edited

Loading