Implement API for Inference Endpoint #1541

Wauplin · 2023-07-05T16:14:55Z

Inference Endpoint is a product from Hugging Face to offer a secure production solution to easily deploy any 🤗 transformers, sentence-transformers and diffusers models from the Hub on dedicated and autoscaling infrastructure managed by Huggingface Face. See documentation for more details.

Would be cool to integrate helpers in huggingface_hub to deal with IE, similarly to Spaces. IE has a nice swagger API which should make it easy to implement. Let's use this issue to discuss the implementation details. Here is how I'd see it:

Have a InferenceEndpoint class containing information about 1 deployed endpoint. Contains generic information about it + convenient methods to pause/update/resume/delete. Also have a client property that returns a InferenceClient object to make prediction.

class InferenceEndpoint:
    # List of parameters:
    name: str
    namespace: str
    token: Optional[str]
    model: ...
    compute: ...
    provider: ...
    type: ...
    ...

    def __init__(self, ...) -> None:
        # Most probable initialized with a endpoint name / endpoint id. 
        # Every parameter listed above would be a property to fetch it in real time (especially stuff like `status`)
        pass

   def update(self, ...) -> None:
        pass

   def delete(self) -> None:
        pass

   def pause(self) -> None:
        pass

   def resume(self) -> None:
        pass

    def get_logs(self) -> str:
        pass

    @property
    def client(self) -> InferenceClient:
        pass

From there, we can add methods to HfApi:

class HfApi:

    def list_inference_endpoints(self, namespace: Optional[str] = None, token: Optional[str] = None) -> List[InferenceEndpoint]:
        # If namespace is None, use self.whoami
        ...

    def create_inference_endpoint(self, ...params, namespace: Optional[str] = None, token: Optional[str] = None) -> InferenceEndpoint:
        ...

cc @philschmid who extensively worked on this product. cc @julien-c @jeffboudier as well.
No clear roadmap yet, this is to trigger a discussion now that we have a reliable InferenceClient.

The text was updated successfully, but these errors were encountered:

sifisKoen · 2023-09-28T22:23:34Z

Hey @Wauplin this issues seems nice. I would like to try it. Firs let me know if I understood correct, do you want to create those functions? I created a blueprint of what I see here. Let me know if it's right.:

class InferenceEndpoint:
    """Represents a deployed model on Hugging Face's Inference Endpoint."""

    def __init__(self, name: str, namespace: str, model: str, token: Optional[str] = None) -> None:
        self.name = name
        self.namespace = namespace
        self.model = model
        self.token = token

    def update(self, **kwargs) -> None:
        """Updates the endpoint settings."""
        pass

    def delete(self) -> None:
        """Deletes the endpoint."""
        pass

    def pause(self) -> None:
        """Pauses the endpoint."""
        pass

    def resume(self) -> None:
        """Resumes the endpoint."""
        pass

    def get_logs(self) -> str:
        """Fetches logs associated with the endpoint."""
        pass

    @property
    def client(self) -> 'InferenceClient':
        """Returns a client to make predictions on this endpoint."""
        return InferenceClient(endpoint=self.name)

class HfApi:
    """Main API client for Hugging Face Hub."""

    def __init__(self, token: Optional[str] = None):
        self.token = token

    def list_inference_endpoints(self, namespace: Optional[str] = None) -> List[InferenceEndpoint]:
        """Lists all inference endpoints for the given namespace."""
        pass

    def create_inference_endpoint(self, model: str, namespace: Optional[str] = None) -> InferenceEndpoint:
        """Creates and returns a new inference endpoint."""
        pass

    def get_inference_endpoint(self, name: str, namespace: Optional[str] = None) -> InferenceEndpoint:
        """Fetches details of a specific endpoint by its name."""
        pass

    def delete_inference_endpoint(self, name: str, namespace: Optional[str] = None) -> None:
        """Deletes a specific endpoint."""
        pass

You want to implement each and every function from here. Am I right? 😄

Wauplin · 2023-10-02T14:04:35Z

Hi @sifisKoen , thanks a lot for proposing your help on this one!
I thought twice about it and I think we should go for a lightweight version of what you suggested there (and what I initially suggested). In particular, having a dataclass InferenceEndpoint with the information returned by the server when creating/listing endpoints. The class should mostly be used as an output with a limited set of methods.

Here is an example of what the server returns for an InferenceEndpoint:

{
  "accountId": "string",
  "compute": {
    "accelerator": "cpu",
    "instanceSize": "large",
    "instanceType": "c6i",
    "scaling": {
      "maxReplica": 8,
      "minReplica": 2
    }
  },
  "model": {
    "framework": "custom",
    "image": {
      "huggingface": {}
    },
    "repository": "gpt2",
    "revision": "6c0e6080953db56375760c0471a8c5f2929baf11",
    "task": "text-classification"
  },
  "name": "my-endpoint",
  "provider": {
    "region": "us-east-1",
    "vendor": "aws"
  },
  "status": {
    "createdAt": "2023-10-02T13:51:08.856Z",
    "createdBy": {
      "id": "string",
      "name": "string"
    },
    "message": "Endpoint is ready",
    "private": {
      "serviceName": "string"
    },
    "readyReplica": 2,
    "state": "pending",
    "targetReplica": 4,
    "updatedAt": "2023-10-02T13:51:08.856Z",
    "updatedBy": {
      "id": "string",
      "name": "string"
    },
    "url": "https://endpoint-id.region.vendor.endpoints.huggingface.cloud"
  },
  "type": "public"
}

Given how detailed this is, here is how I would proceed:

Have high-level parameters set at root level of the object: name, url, model_framework, model_repository, model_revision, model_task, create_at, update_at, endpoint_type,... These attributes will be the most used one by users so it's preferable to make them accessible.
Have a raw attribute that contains the full dictionary. Power-users might want to use it for some specific keys so keeping the raw data from the server is important.
Have a single property method client that would return an InferenceClient object bound to this InferenceEndpoint specifically.

Here is what it would look like:

@dataclass
class InferenceEndpoint:
    """Represents a deployed model on Hugging Face's Inference Endpoint."""

    name: str
    namespace: str
    model_repository: str
    ...
    raw: Dict

    @property
    def client(self) -> 'InferenceClient':
        """Returns a client to make predictions on this endpoint."""
        return InferenceClient(endpoint=self.url)

Finally, best way to start with only get_inference_endpoint method in HfApi. That would be already a good start given the work needed to setup everything.

class HfApi:
    (...)

    def get_inference_endpoint(self, name: str, namespace: Optional[str] = None) -> InferenceEndpoint:
        """Fetches details of a specific endpoint by its name."""
        pass

Please let me know if that works for you or if you have any suggestion/feedback regarding the suggested API :)

sifisKoen · 2023-10-04T15:59:09Z

Hey @Wauplin your suggestion sounds very good and doable for sure. So to make it clear you want for the beginning only one method in HfApi and the client method in InferenceEndpoint. Am I right?
I will start the development as soon as possible 😄

Wauplin · 2023-10-04T16:00:48Z

Exactly! Start small, then build on top of it :)
Btw do you know Hacktoberfest? If you register there, your contributions on huggingface_hub will count as well!

sifisKoen · 2023-10-04T16:50:32Z

Perfect I will start the development of those two functions 😄
No I didn't know this event. But I just read about it and it seems very interesting and very lovely. I will register for sure ❤️

sifisKoen · 2023-10-05T11:56:22Z

Hey @Wauplin I think that I implemented what you asked. Please let me know if what I have done is ok for you:
This is just an blueprint but I want to know if I am in correct way of thinking.

BASE_URL = "https://api.huggingface.co"

 def get_inference_endpoint(self, name: str, namespace: Optional[str] = None) -> InferenceEndpoint:
        """Fetches details of a specific endpoint by its name."""
        
        endpoint_url = f"{self.BASE_URL}/path_to_endpoints/{name}" 
                
        response = requests.get(endpoint_url)
        
        # Check for a successful response
        if response.status_code != 200:

            raise Exception(f"Failed to fetch endpoint details. Server responded with: {response.status_code}: {response.text}")

        data = response.json()  # Convert the response to a Python dictionary
        
        # Now create an instance of InferenceEndpoint based on the response data
        endpoint = InferenceEndpoint(
            name=data["name"],
            namespace=data["accountId"],  
            model_repository=data["model"]["repository"],
            model_framework=data["model"]["framework"],
            model_revision=data["model"]["revision"],
            model_task=data["model"]["task"],
            create_at=data["status"]["createdAt"],
            update_at=data["status"]["updatedAt"],
            endpoint_type=data["type"],
            url=data["status"]["url"],
            raw=data
        )

        return endpoint

Additionally about the client you want this:

@dataclass
class InferenceEndpoint:
    """Represents a deployed model on Hugging Face's Inference Endpoint."""
    
    name: str
    namespace: str
    model_repository: str
    model_framework: str
    model_revision: str
    model_task: str
    create_at: str
    update_at: str
    endpoint_type: str
    url: str
    raw: Dict

    @property
    def client(self) -> 'InferenceClient':
        """Returns a client to make predictions on this endpoint."""
        return InferenceClient(endpoint=self.url)

Let me know if what I have done is correct and I will open a PR 😄

Wauplin · 2023-10-05T12:04:56Z

Hey @sifisKoen, thanks for starting this issue!
I have some comments to share about the snippets you've posted above but a PR would be more appropriate to discuss them. In general, PRs are better as I can pin-point exact lines in the code and we can have multiple threads to discuss different topics (instead of a single thread in this issue). If you don't feel your code is entirely ready, it's fine to open a draft PR and ping me on it so we can discuss further. Does that work for you?

Wauplin · 2023-10-25T15:11:51Z

Hey @sifisKoen! As discussed offline, I started to work on this issue. Here is the draft PR I started if you want to have a look: #1779. Will gather some feedback, iterate a bit and then merge this first version :) Thanks for the initial work that started the project!

Wauplin · 2023-10-30T09:33:37Z

PR #1779 is merged! Check out the Inference Endpoints guide here. API is already available when installing from source, otherwise a v0.19.0 release should be shipped soon! Therefore closing this PR :)

Wauplin added the enhancement New feature or request label Jul 5, 2023

Wauplin mentioned this issue Jul 12, 2023

Add get_model_status to get a model status on InferenceAPI #1558

Closed

This was referenced Aug 23, 2023

Wrong error to handle Paused or Scaled to Zero endpoints. #1605

Closed

Add get_model_status function (#1558) #1559

Merged

Wauplin added this to the in next release? milestone Oct 13, 2023

Wauplin mentioned this issue Oct 25, 2023

Implement API for Inference Endpoints #1779

Merged

Wauplin closed this as completed Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement API for Inference Endpoint #1541

Implement API for Inference Endpoint #1541

Wauplin commented Jul 5, 2023

sifisKoen commented Sep 28, 2023

Wauplin commented Oct 2, 2023

sifisKoen commented Oct 4, 2023

Wauplin commented Oct 4, 2023

sifisKoen commented Oct 4, 2023

sifisKoen commented Oct 5, 2023

Wauplin commented Oct 5, 2023

Wauplin commented Oct 25, 2023

Wauplin commented Oct 30, 2023 •

edited

Loading

Implement API for Inference Endpoint #1541

Implement API for Inference Endpoint #1541

Comments

Wauplin commented Jul 5, 2023

sifisKoen commented Sep 28, 2023

Wauplin commented Oct 2, 2023

sifisKoen commented Oct 4, 2023

Wauplin commented Oct 4, 2023

sifisKoen commented Oct 4, 2023

sifisKoen commented Oct 5, 2023

Wauplin commented Oct 5, 2023

Wauplin commented Oct 25, 2023

Wauplin commented Oct 30, 2023 • edited Loading

Wauplin commented Oct 30, 2023 •

edited

Loading