Add wait-for-model header when sending request to Inference API #2318

Wauplin · 2024-06-07T13:36:35Z

Should fix #2175.

In the current implementation, InferenceClient sends a request every 1s as long as the model is unavailable (HTTP 503). This can lead users to be rate limited even though they don't consume the API (reported here). This PR adds "X-wait-for-model": "1" as header which tell the server to wait for the model to be loaded before returning a response. This way the client doesn't make calls every X seconds for nothing. This X-wait-for-model header is added only when requesting the serverless Inference API.

EDIT: based on @Narsil's comment, header is added to the request only on the second call. This way, user don't reach the rate limit but we are still able to log a message to tell the user the model is not loaded yet.

cc @Narsil (from private slack thread)

Narsil

Oh by the way, for the "stuck" feeling you were mentionning.

What about doing 1 query without the wait, and only adding the way after the first retry ?

2 Queries, but at least with a good error message.

HuggingFaceDocBuilderDev · 2024-06-07T13:40:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Wauplin · 2024-06-07T13:56:25Z

What about doing 1 query without the wait, and only adding the way after the first retry ?

Nice idea! Implemented it in 8ea3f1d

LysandreJik

Sounds good!

Wauplin · 2024-06-13T07:38:14Z

Thanks for the review!

Add wait-for-model header when sending request to Inference API

bf57460

Wauplin requested review from Narsil and LysandreJik June 7, 2024 13:36

Wauplin mentioned this pull request Jun 7, 2024

429 error in InferenceClient #2175

Closed

4 tasks

Narsil reviewed Jun 7, 2024

View reviewed changes

Wauplin added 2 commits June 7, 2024 15:53

Add X-wait-for-model only after first call

8ea3f1d

fix test

bb5c997

Wauplin added 2 commits June 7, 2024 15:56

Merge branch 'main' into 2175-wait-for-model-in-inference-client

86ce530

Merge branch 'main' into 2175-wait-for-model-in-inference-client

772851f

LysandreJik approved these changes Jun 12, 2024

View reviewed changes

Wauplin merged commit 3375448 into main Jun 13, 2024
16 of 17 checks passed

Wauplin deleted the 2175-wait-for-model-in-inference-client branch June 13, 2024 07:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add wait-for-model header when sending request to Inference API #2318

Add wait-for-model header when sending request to Inference API #2318

Wauplin commented Jun 7, 2024 •

edited

Loading

Narsil left a comment

HuggingFaceDocBuilderDev commented Jun 7, 2024

Wauplin commented Jun 7, 2024

LysandreJik left a comment

Wauplin commented Jun 13, 2024

Add wait-for-model header when sending request to Inference API #2318

Add wait-for-model header when sending request to Inference API #2318

Conversation

Wauplin commented Jun 7, 2024 • edited Loading

Narsil left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 7, 2024

Wauplin commented Jun 7, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

Wauplin commented Jun 13, 2024

Wauplin commented Jun 7, 2024 •

edited

Loading