Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wait-for-model header when sending request to Inference API #2318

Merged
merged 5 commits into from
Jun 13, 2024

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Jun 7, 2024

Should fix #2175.

In the current implementation, InferenceClient sends a request every 1s as long as the model is unavailable (HTTP 503). This can lead users to be rate limited even though they don't consume the API (reported here). This PR adds "X-wait-for-model": "1" as header which tell the server to wait for the model to be loaded before returning a response. This way the client doesn't make calls every X seconds for nothing. This X-wait-for-model header is added only when requesting the serverless Inference API.

EDIT: based on @Narsil's comment, header is added to the request only on the second call. This way, user don't reach the rate limit but we are still able to log a message to tell the user the model is not loaded yet.

cc @Narsil (from private slack thread)

@Wauplin Wauplin requested review from Narsil and LysandreJik June 7, 2024 13:36
@Wauplin Wauplin mentioned this pull request Jun 7, 2024
4 tasks
Copy link
Contributor

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh by the way, for the "stuck" feeling you were mentionning.

What about doing 1 query without the wait, and only adding the way after the first retry ?

2 Queries, but at least with a good error message.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Wauplin
Copy link
Contributor Author

Wauplin commented Jun 7, 2024

What about doing 1 query without the wait, and only adding the way after the first retry ?

Nice idea! Implemented it in 8ea3f1d

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@Wauplin
Copy link
Contributor Author

Wauplin commented Jun 13, 2024

Thanks for the review!

@Wauplin Wauplin merged commit 3375448 into main Jun 13, 2024
16 of 17 checks passed
@Wauplin Wauplin deleted the 2175-wait-for-model-in-inference-client branch June 13, 2024 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

429 error in InferenceClient
4 participants