Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement API for Inference Endpoint #1541

Closed
Wauplin opened this issue Jul 5, 2023 · 9 comments
Closed

Implement API for Inference Endpoint #1541

Wauplin opened this issue Jul 5, 2023 · 9 comments
Labels
enhancement New feature or request

Comments

@Wauplin
Copy link
Contributor

Wauplin commented Jul 5, 2023

Inference Endpoint is a product from Hugging Face to offer a secure production solution to easily deploy any 🤗 transformers, sentence-transformers and diffusers models from the Hub on dedicated and autoscaling infrastructure managed by Huggingface Face. See documentation for more details.

Would be cool to integrate helpers in huggingface_hub to deal with IE, similarly to Spaces. IE has a nice swagger API which should make it easy to implement. Let's use this issue to discuss the implementation details. Here is how I'd see it:

Have a InferenceEndpoint class containing information about 1 deployed endpoint. Contains generic information about it + convenient methods to pause/update/resume/delete. Also have a client property that returns a InferenceClient object to make prediction.

class InferenceEndpoint:
    # List of parameters:
    name: str
    namespace: str
    token: Optional[str]
    model: ...
    compute: ...
    provider: ...
    type: ...
    ...

    def __init__(self, ...) -> None:
        # Most probable initialized with a endpoint name / endpoint id. 
        # Every parameter listed above would be a property to fetch it in real time (especially stuff like `status`)
        pass

   def update(self, ...) -> None:
        pass

   def delete(self) -> None:
        pass

   def pause(self) -> None:
        pass

   def resume(self) -> None:
        pass

    def get_logs(self) -> str:
        pass

    @property
    def client(self) -> InferenceClient:
        pass

From there, we can add methods to HfApi:

class HfApi:

    def list_inference_endpoints(self, namespace: Optional[str] = None, token: Optional[str] = None) -> List[InferenceEndpoint]:
        # If namespace is None, use self.whoami
        ...

    def create_inference_endpoint(self, ...params, namespace: Optional[str] = None, token: Optional[str] = None) -> InferenceEndpoint:
        ...

cc @philschmid who extensively worked on this product. cc @julien-c @jeffboudier as well.
No clear roadmap yet, this is to trigger a discussion now that we have a reliable InferenceClient.

@sifisKoen
Copy link
Contributor

Hey @Wauplin this issues seems nice. I would like to try it. Firs let me know if I understood correct, do you want to create those functions? I created a blueprint of what I see here. Let me know if it's right.:

class InferenceEndpoint:
    """Represents a deployed model on Hugging Face's Inference Endpoint."""

    def __init__(self, name: str, namespace: str, model: str, token: Optional[str] = None) -> None:
        self.name = name
        self.namespace = namespace
        self.model = model
        self.token = token

    def update(self, **kwargs) -> None:
        """Updates the endpoint settings."""
        pass

    def delete(self) -> None:
        """Deletes the endpoint."""
        pass

    def pause(self) -> None:
        """Pauses the endpoint."""
        pass

    def resume(self) -> None:
        """Resumes the endpoint."""
        pass

    def get_logs(self) -> str:
        """Fetches logs associated with the endpoint."""
        pass

    @property
    def client(self) -> 'InferenceClient':
        """Returns a client to make predictions on this endpoint."""
        return InferenceClient(endpoint=self.name)

class HfApi:
    """Main API client for Hugging Face Hub."""

    def __init__(self, token: Optional[str] = None):
        self.token = token

    def list_inference_endpoints(self, namespace: Optional[str] = None) -> List[InferenceEndpoint]:
        """Lists all inference endpoints for the given namespace."""
        pass

    def create_inference_endpoint(self, model: str, namespace: Optional[str] = None) -> InferenceEndpoint:
        """Creates and returns a new inference endpoint."""
        pass

    def get_inference_endpoint(self, name: str, namespace: Optional[str] = None) -> InferenceEndpoint:
        """Fetches details of a specific endpoint by its name."""
        pass

    def delete_inference_endpoint(self, name: str, namespace: Optional[str] = None) -> None:
        """Deletes a specific endpoint."""
        pass

You want to implement each and every function from here. Am I right? 😄

@Wauplin
Copy link
Contributor Author

Wauplin commented Oct 2, 2023

Hi @sifisKoen , thanks a lot for proposing your help on this one!
I thought twice about it and I think we should go for a lightweight version of what you suggested there (and what I initially suggested). In particular, having a dataclass InferenceEndpoint with the information returned by the server when creating/listing endpoints. The class should mostly be used as an output with a limited set of methods.

Here is an example of what the server returns for an InferenceEndpoint:

{
  "accountId": "string",
  "compute": {
    "accelerator": "cpu",
    "instanceSize": "large",
    "instanceType": "c6i",
    "scaling": {
      "maxReplica": 8,
      "minReplica": 2
    }
  },
  "model": {
    "framework": "custom",
    "image": {
      "huggingface": {}
    },
    "repository": "gpt2",
    "revision": "6c0e6080953db56375760c0471a8c5f2929baf11",
    "task": "text-classification"
  },
  "name": "my-endpoint",
  "provider": {
    "region": "us-east-1",
    "vendor": "aws"
  },
  "status": {
    "createdAt": "2023-10-02T13:51:08.856Z",
    "createdBy": {
      "id": "string",
      "name": "string"
    },
    "message": "Endpoint is ready",
    "private": {
      "serviceName": "string"
    },
    "readyReplica": 2,
    "state": "pending",
    "targetReplica": 4,
    "updatedAt": "2023-10-02T13:51:08.856Z",
    "updatedBy": {
      "id": "string",
      "name": "string"
    },
    "url": "https://endpoint-id.region.vendor.endpoints.huggingface.cloud"
  },
  "type": "public"
}

Given how detailed this is, here is how I would proceed:

  1. Have high-level parameters set at root level of the object: name, url, model_framework, model_repository, model_revision, model_task, create_at, update_at, endpoint_type,... These attributes will be the most used one by users so it's preferable to make them accessible.
  2. Have a raw attribute that contains the full dictionary. Power-users might want to use it for some specific keys so keeping the raw data from the server is important.
  3. Have a single property method client that would return an InferenceClient object bound to this InferenceEndpoint specifically.

Here is what it would look like:

@dataclass
class InferenceEndpoint:
    """Represents a deployed model on Hugging Face's Inference Endpoint."""

    name: str
    namespace: str
    model_repository: str
    ...
    raw: Dict

    @property
    def client(self) -> 'InferenceClient':
        """Returns a client to make predictions on this endpoint."""
        return InferenceClient(endpoint=self.url)

Finally, best way to start with only get_inference_endpoint method in HfApi. That would be already a good start given the work needed to setup everything.

class HfApi:
    (...)

    def get_inference_endpoint(self, name: str, namespace: Optional[str] = None) -> InferenceEndpoint:
        """Fetches details of a specific endpoint by its name."""
        pass

Please let me know if that works for you or if you have any suggestion/feedback regarding the suggested API :)

@sifisKoen
Copy link
Contributor

Hey @Wauplin your suggestion sounds very good and doable for sure. So to make it clear you want for the beginning only one method in HfApi and the client method in InferenceEndpoint. Am I right?
I will start the development as soon as possible 😄

@Wauplin
Copy link
Contributor Author

Wauplin commented Oct 4, 2023

Exactly! Start small, then build on top of it :)
Btw do you know Hacktoberfest? If you register there, your contributions on huggingface_hub will count as well!

@sifisKoen
Copy link
Contributor

Perfect I will start the development of those two functions 😄
No I didn't know this event. But I just read about it and it seems very interesting and very lovely. I will register for sure ❤️

@sifisKoen
Copy link
Contributor

Hey @Wauplin I think that I implemented what you asked. Please let me know if what I have done is ok for you:
This is just an blueprint but I want to know if I am in correct way of thinking.

BASE_URL = "https://api.huggingface.co"

 def get_inference_endpoint(self, name: str, namespace: Optional[str] = None) -> InferenceEndpoint:
        """Fetches details of a specific endpoint by its name."""
        
        endpoint_url = f"{self.BASE_URL}/path_to_endpoints/{name}" 
                
        response = requests.get(endpoint_url)
        
        # Check for a successful response
        if response.status_code != 200:

            raise Exception(f"Failed to fetch endpoint details. Server responded with: {response.status_code}: {response.text}")

        data = response.json()  # Convert the response to a Python dictionary
        
        # Now create an instance of InferenceEndpoint based on the response data
        endpoint = InferenceEndpoint(
            name=data["name"],
            namespace=data["accountId"],  
            model_repository=data["model"]["repository"],
            model_framework=data["model"]["framework"],
            model_revision=data["model"]["revision"],
            model_task=data["model"]["task"],
            create_at=data["status"]["createdAt"],
            update_at=data["status"]["updatedAt"],
            endpoint_type=data["type"],
            url=data["status"]["url"],
            raw=data
        )

        return endpoint

Additionally about the client you want this:

@dataclass
class InferenceEndpoint:
    """Represents a deployed model on Hugging Face's Inference Endpoint."""
    
    name: str
    namespace: str
    model_repository: str
    model_framework: str
    model_revision: str
    model_task: str
    create_at: str
    update_at: str
    endpoint_type: str
    url: str
    raw: Dict

    @property
    def client(self) -> 'InferenceClient':
        """Returns a client to make predictions on this endpoint."""
        return InferenceClient(endpoint=self.url)

Let me know if what I have done is correct and I will open a PR 😄

@Wauplin
Copy link
Contributor Author

Wauplin commented Oct 5, 2023

Hey @sifisKoen, thanks for starting this issue!
I have some comments to share about the snippets you've posted above but a PR would be more appropriate to discuss them. In general, PRs are better as I can pin-point exact lines in the code and we can have multiple threads to discuss different topics (instead of a single thread in this issue). If you don't feel your code is entirely ready, it's fine to open a draft PR and ping me on it so we can discuss further. Does that work for you?

@Wauplin
Copy link
Contributor Author

Wauplin commented Oct 25, 2023

Hey @sifisKoen! As discussed offline, I started to work on this issue. Here is the draft PR I started if you want to have a look: #1779. Will gather some feedback, iterate a bit and then merge this first version :) Thanks for the initial work that started the project!

@Wauplin
Copy link
Contributor Author

Wauplin commented Oct 30, 2023

PR #1779 is merged! Check out the Inference Endpoints guide here. API is already available when installing from source, otherwise a v0.19.0 release should be shipped soon! Therefore closing this PR :)

@Wauplin Wauplin closed this as completed Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants