Implement API for Inference Endpoints #1779

Wauplin · 2023-10-25T14:59:52Z

Implement #1541 (+fix #1605).

Ping @philschmid @jeffboudier Feedback is very welcomed if you see anything that can be improved product-wise :)

EDIT / TL;DR: here is the guide written in this PR.

This PR adds support for Inference Endpoints, following the Swagger API docs.

HfApi methods:
InferenceEndpoint object + methods:
- client / async_client => return InferenceClient object
- wait() => wait until fully deployed
- update() => alias for update_inference_endpoint
- resume() => alias for resume_inference_endpoint
- pause() => alias for pause_inference_endpoint
- scale_to_zero() => alias for scale_to_zero_inference_endpoint
- delete() => alias for delete_inference_endpoint

I intentionally did not implement metrics and logs endpoints yet as I don't think that should be a priority.

Listing/getting inference endpoints information is quite straightforward, given the user namespace or endpoint name. Same goes for resume/pause/scale_to_zero feature which AFAIK would be the most useful ones in scripts. Creating and updating endpoints is more difficult as the user needs to know exactly the configuration they want to use.

The main object returned by most methods is an InferenceEndpoint dataclass with useful information like name, status, url, model, framework, task, created_at/updated_at,... It also has 2 properties .client and .async_client to run inference.

Regarding testing the API, I am not sure we can and we want to do that in the CI. Like for the Spaces API, it's quite hard (and costly) to do end-to-end tests on production and for little benefit IMO (as we can assume v2 endpoints will not be updated). I ran some tests locally to make sure everything runs as expected though.

Example:

>>> from huggingface_hub import create_inference_endpoint

# Create endpoint
>>> endpoint = create_inference_endpoint(
...     "my-endpoint-name6",
...     repository="gpt2",
...     framework="pytorch",
...     task="text-generation",
...     accelerator="cpu",
...     vendor="aws",
...     region="us-east-1",
...     type="protected",
...     instance_size="medium",
...     instance_type="c6i"
... )

# pending creation => no url => cannot get client
>>> endpoint
InferenceEndpoint(name='my-endpoint-name6', namespace='Wauplin', repository='gpt2', status='pending', url=None)
>>> endpoint.client 
*** huggingface_hub._inference_endpoints.InferenceEndpointException: Cannot create a client for this endpoint as it is not yet deployed. Please wait for the endpoint to be deployed and try again.

# ... wait until it's initialized
>>> endpoint.wait()
InferenceEndpoint(name='my-endpoint-name6', namespace='Wauplin', repository='gpt2', status='running', url='https://kqehm5t0lfe628b2.us-east-1.aws.endpoints.huggingface.cloud')

# endpoint running => url => client available
>>> endpoint.client 
<InferenceClient(model='https://kqehm5t0lfe628b2.us-east-1.aws.endpoints.huggingface.cloud', timeout=None)>
>>> endpoint.client.text_generation("I am")
' not a fan of the idea of a "big-budget" movie. I think it\'s a'

# pause endpoint => no more url
>>> endpoint.pause()
InferenceEndpoint(name='my-endpoint-name6', namespace='Wauplin', repository='gpt2', status='paused', url=None)

Useful resources:

Generated documentation from this PR:
- guide
- package_reference
Swagger API docs: https://api.endpoints.huggingface.cloud/
Inference Endpoints: https://ui.endpoints.huggingface.co/
Inference Endpoints documentation: https://huggingface.co/docs/inference-endpoints/index

HuggingFaceDocBuilderDev · 2023-10-25T15:09:28Z

The documentation is not available anymore as the PR was closed or merged.

stevhliu

Super nice, the huggingface_hub library is becoming so versatile! 🚀

docs/source/en/package_reference/inference_endpoints.md

src/huggingface_hub/_inference_endpoints.py

src/huggingface_hub/hf_api.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

julien-c

I should know this for sure but what's the ≠ between paused & scaled to zero, again?

Another question: do you have an example of how to chain .wait and an actual inference call?

would you do something like this?

create_inference_endpoint(
    "my-endpoint-name6",
    repository="gpt2",
).wait().client.text_generation("I am").scale_to_zero()

Wauplin · 2023-10-26T13:43:23Z

I should know this for sure but what's the ≠ between paused & scaled to zero, again?

I'll let @philschmid confirm, not sure myself what's the difference.
What I understood is that a paused endpoint must be resumed manually while a scaled to zero endpoint is automatically restarted on next call (with cold start).

would you do something like this?

Not exactly no since InferenceEndpoint != InferenceClient objects.
What you can do:

endpoint = create_inference_endpoint("my-endpoint-name6", repository="gpt2",...).wait()
endpoint.client.text_generation("I am")
endpoint.scale_to_zero()

Wauplin · 2023-10-26T17:35:09Z

@philschmid @stevhliu I have added a guide about how to manage Inference Endpoints from huggingface_hub. Would you mind having a look at it? 🙏

stevhliu

Nice guide, especially like the end-to-end example at the end that puts everything together in context :)

docs/source/en/guides/inference_endpoints.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

McPatate

Few nits, but lgtm overall!

Nice job!

docs/source/en/guides/inference_endpoints.md

McPatate · 2023-10-27T08:58:56Z

docs/source/en/package_reference/inference_endpoints.md

@@ -0,0 +1,48 @@
+# Inference Endpoints
+
+Inference Endpoints offers a secure production solution to easily deploy any `transformers`, `sentence-transformers`, and `diffusers` models from the Hub on a dedicated and autoscaling infrastructure managed by Hugging Face.


To deploy any kind of model, I don't think we're limited to the libraries you mention here.
cc @philschmid

I took that part from the official documentation there: https://huggingface.co/docs/inference-endpoints/main/en/index

those models are what we defaultly support without custom handler or container. But not limiting.

Maybe wording could be improved, when I read this at first it's not clear to me that you can run any model

I've changed the wording in 1688851 to be more generic:

Inference Endpoints provides a secure production solution to easily deploy models on a dedicated and autoscaling infrastructure managed by Hugging Face. An Inference Endpoint is built from a model from the Hub. This page is a reference for huggingface_hub's integration with Inference Endpoints. For more information about the Inference Endpoints product, check out its official documentation.

Keeping it vague on purpose. If the user wants more details about supported models, the official Inference Endpoints documentation should be the appropriate location to list them. These docs are referenced twice from the huggingface_hub's doc (from this PR) so should be fine.

docs/source/en/package_reference/inference_endpoints.md

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

Wauplin · 2023-10-27T09:58:56Z

Thanks @stevhliu and @McPatate for your feedback on the documentation, It helped a lot! I've made the requested changes :)

LysandreJik

Looks awesome! Only left a few nits. The API is intuitive.

docs/source/en/guides/inference_endpoints.md

docs/source/en/package_reference/inference_endpoints.md

Co-authored-by: Lysandre Debut <hi@lysand.re>

Wauplin · 2023-10-30T08:45:16Z

Thanks @LysandreJik for the review! I think we are good to merge when CI is green then :)
Thanks everyone here for the feedback on this PR! 🤗

docs/source/en/package_reference/inference_endpoints.md

Wauplin · 2023-10-30T09:21:48Z

Failing tests are unrelated. Merging this!

Wauplin added 2 commits October 25, 2023 16:13

Draft inference endpoints API

b42e9dc

fix nullable url + docs

267bf19

Wauplin requested review from julien-c, LysandreJik, philschmid and stevhliu October 25, 2023 14:59

Wauplin added 2 commits October 25, 2023 17:00

fix date parsing with nanosecond

7402cc7

add test

837d88c

This was referenced Oct 25, 2023

Implement API for Inference Endpoint #1541

Closed

Wrong error to handle Paused or Scaled to Zero endpoints. #1605

Closed

stevhliu approved these changes Oct 25, 2023

View reviewed changes

Wauplin and others added 8 commits October 26, 2023 10:05

Apply suggestions from code review

45d857e

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

harmonize documentation

97ae1dc

more complete InferenceEndpoint

35ee060

more logs

56a4725

docs

22b2b35

exceptions

e0589aa

Merge branch 'main' into 1541-inference-endpoints-api

3ead200

Test InferenceEndpoint logic

f0ba819

julien-c reviewed Oct 26, 2023

View reviewed changes

Wauplin added 2 commits October 26, 2023 19:15

add guide

887754c

docstrings

f22c998

stevhliu approved these changes Oct 26, 2023

View reviewed changes

Wauplin and others added 4 commits October 27, 2023 08:27

Apply suggestions from code review

d5b1613

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

fix doc

a537c7a

fix docs

e78cdc9

Merge branch 'main' into 1541-inference-endpoints-api

2a66ab0

McPatate reviewed Oct 27, 2023

View reviewed changes

Wauplin and others added 2 commits October 27, 2023 11:49

Update docs/source/en/guides/inference_endpoints.md

670a248

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

requested changes

2e24b91

add namespace='*' to list from all orgs

fd54d72

LysandreJik approved these changes Oct 30, 2023

View reviewed changes

Apply suggestions from code review

e741ecb

Co-authored-by: Lysandre Debut <hi@lysand.re>

Wauplin commented Oct 30, 2023

View reviewed changes

docs/source/en/package_reference/inference_endpoints.md Outdated Show resolved Hide resolved

Update docs/source/en/package_reference/inference_endpoints.md

1688851

Wauplin merged commit 91d38dd into main Oct 30, 2023
12 of 16 checks passed

Wauplin deleted the 1541-inference-endpoints-api branch October 30, 2023 09:21

Wauplin mentioned this pull request Oct 30, 2023

Fix inference endpoints docs #1785

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement API for Inference Endpoints #1779

Implement API for Inference Endpoints #1779

Wauplin commented Oct 25, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 25, 2023 •

edited

Loading

stevhliu left a comment

julien-c left a comment

Wauplin commented Oct 26, 2023

Wauplin commented Oct 26, 2023

stevhliu left a comment

McPatate left a comment

McPatate Oct 27, 2023

Wauplin Oct 27, 2023

philschmid Oct 30, 2023

McPatate Oct 30, 2023

Wauplin Oct 30, 2023

Wauplin commented Oct 27, 2023

LysandreJik left a comment

Wauplin commented Oct 30, 2023

Wauplin commented Oct 30, 2023

		@@ -0,0 +1,48 @@
		# Inference Endpoints

		Inference Endpoints offers a secure production solution to easily deploy any `transformers`, `sentence-transformers`, and `diffusers` models from the Hub on a dedicated and autoscaling infrastructure managed by Hugging Face.

Implement API for Inference Endpoints #1779

Implement API for Inference Endpoints #1779

Conversation

Wauplin commented Oct 25, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Oct 25, 2023 • edited Loading

stevhliu left a comment

Choose a reason for hiding this comment

julien-c left a comment

Choose a reason for hiding this comment

Wauplin commented Oct 26, 2023

Wauplin commented Oct 26, 2023

stevhliu left a comment

Choose a reason for hiding this comment

McPatate left a comment

Choose a reason for hiding this comment

McPatate Oct 27, 2023

Choose a reason for hiding this comment

Wauplin Oct 27, 2023

Choose a reason for hiding this comment

philschmid Oct 30, 2023

Choose a reason for hiding this comment

McPatate Oct 30, 2023

Choose a reason for hiding this comment

Wauplin Oct 30, 2023

Choose a reason for hiding this comment

Wauplin commented Oct 27, 2023

LysandreJik left a comment

Choose a reason for hiding this comment

Wauplin commented Oct 30, 2023

Wauplin commented Oct 30, 2023

Wauplin commented Oct 25, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 25, 2023 •

edited

Loading