-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement API for Inference Endpoint #1541
Comments
Hey @Wauplin this issues seems nice. I would like to try it. Firs let me know if I understood correct, do you want to create those functions? I created a blueprint of what I see here. Let me know if it's right.: class InferenceEndpoint:
"""Represents a deployed model on Hugging Face's Inference Endpoint."""
def __init__(self, name: str, namespace: str, model: str, token: Optional[str] = None) -> None:
self.name = name
self.namespace = namespace
self.model = model
self.token = token
def update(self, **kwargs) -> None:
"""Updates the endpoint settings."""
pass
def delete(self) -> None:
"""Deletes the endpoint."""
pass
def pause(self) -> None:
"""Pauses the endpoint."""
pass
def resume(self) -> None:
"""Resumes the endpoint."""
pass
def get_logs(self) -> str:
"""Fetches logs associated with the endpoint."""
pass
@property
def client(self) -> 'InferenceClient':
"""Returns a client to make predictions on this endpoint."""
return InferenceClient(endpoint=self.name)
class HfApi:
"""Main API client for Hugging Face Hub."""
def __init__(self, token: Optional[str] = None):
self.token = token
def list_inference_endpoints(self, namespace: Optional[str] = None) -> List[InferenceEndpoint]:
"""Lists all inference endpoints for the given namespace."""
pass
def create_inference_endpoint(self, model: str, namespace: Optional[str] = None) -> InferenceEndpoint:
"""Creates and returns a new inference endpoint."""
pass
def get_inference_endpoint(self, name: str, namespace: Optional[str] = None) -> InferenceEndpoint:
"""Fetches details of a specific endpoint by its name."""
pass
def delete_inference_endpoint(self, name: str, namespace: Optional[str] = None) -> None:
"""Deletes a specific endpoint."""
pass You want to implement each and every function from here. Am I right? 😄 |
Hi @sifisKoen , thanks a lot for proposing your help on this one! Here is an example of what the server returns for an InferenceEndpoint: {
"accountId": "string",
"compute": {
"accelerator": "cpu",
"instanceSize": "large",
"instanceType": "c6i",
"scaling": {
"maxReplica": 8,
"minReplica": 2
}
},
"model": {
"framework": "custom",
"image": {
"huggingface": {}
},
"repository": "gpt2",
"revision": "6c0e6080953db56375760c0471a8c5f2929baf11",
"task": "text-classification"
},
"name": "my-endpoint",
"provider": {
"region": "us-east-1",
"vendor": "aws"
},
"status": {
"createdAt": "2023-10-02T13:51:08.856Z",
"createdBy": {
"id": "string",
"name": "string"
},
"message": "Endpoint is ready",
"private": {
"serviceName": "string"
},
"readyReplica": 2,
"state": "pending",
"targetReplica": 4,
"updatedAt": "2023-10-02T13:51:08.856Z",
"updatedBy": {
"id": "string",
"name": "string"
},
"url": "https://endpoint-id.region.vendor.endpoints.huggingface.cloud"
},
"type": "public"
} Given how detailed this is, here is how I would proceed:
Here is what it would look like: @dataclass
class InferenceEndpoint:
"""Represents a deployed model on Hugging Face's Inference Endpoint."""
name: str
namespace: str
model_repository: str
...
raw: Dict
@property
def client(self) -> 'InferenceClient':
"""Returns a client to make predictions on this endpoint."""
return InferenceClient(endpoint=self.url) Finally, best way to start with only class HfApi:
(...)
def get_inference_endpoint(self, name: str, namespace: Optional[str] = None) -> InferenceEndpoint:
"""Fetches details of a specific endpoint by its name."""
pass Please let me know if that works for you or if you have any suggestion/feedback regarding the suggested API :) |
Hey @Wauplin your suggestion sounds very good and doable for sure. So to make it clear you want for the beginning only one method in |
Exactly! Start small, then build on top of it :) |
Perfect I will start the development of those two functions 😄 |
Hey @Wauplin I think that I implemented what you asked. Please let me know if what I have done is ok for you: BASE_URL = "https://api.huggingface.co"
def get_inference_endpoint(self, name: str, namespace: Optional[str] = None) -> InferenceEndpoint:
"""Fetches details of a specific endpoint by its name."""
endpoint_url = f"{self.BASE_URL}/path_to_endpoints/{name}"
response = requests.get(endpoint_url)
# Check for a successful response
if response.status_code != 200:
raise Exception(f"Failed to fetch endpoint details. Server responded with: {response.status_code}: {response.text}")
data = response.json() # Convert the response to a Python dictionary
# Now create an instance of InferenceEndpoint based on the response data
endpoint = InferenceEndpoint(
name=data["name"],
namespace=data["accountId"],
model_repository=data["model"]["repository"],
model_framework=data["model"]["framework"],
model_revision=data["model"]["revision"],
model_task=data["model"]["task"],
create_at=data["status"]["createdAt"],
update_at=data["status"]["updatedAt"],
endpoint_type=data["type"],
url=data["status"]["url"],
raw=data
)
return endpoint Additionally about the @dataclass
class InferenceEndpoint:
"""Represents a deployed model on Hugging Face's Inference Endpoint."""
name: str
namespace: str
model_repository: str
model_framework: str
model_revision: str
model_task: str
create_at: str
update_at: str
endpoint_type: str
url: str
raw: Dict
@property
def client(self) -> 'InferenceClient':
"""Returns a client to make predictions on this endpoint."""
return InferenceClient(endpoint=self.url) Let me know if what I have done is correct and I will open a PR 😄 |
Hey @sifisKoen, thanks for starting this issue! |
Hey @sifisKoen! As discussed offline, I started to work on this issue. Here is the draft PR I started if you want to have a look: #1779. Will gather some feedback, iterate a bit and then merge this first version :) Thanks for the initial work that started the project! |
PR #1779 is merged! Check out the Inference Endpoints guide here. API is already available when installing from source, otherwise a |
Inference Endpoint is a product from Hugging Face to offer a secure production solution to easily deploy any 🤗
transformers
,sentence-transformers
anddiffusers
models from the Hub on dedicated and autoscaling infrastructure managed by Huggingface Face. See documentation for more details.Would be cool to integrate helpers in
huggingface_hub
to deal with IE, similarly to Spaces. IE has a nice swagger API which should make it easy to implement. Let's use this issue to discuss the implementation details. Here is how I'd see it:Have a
InferenceEndpoint
class containing information about 1 deployed endpoint. Contains generic information about it + convenient methods to pause/update/resume/delete. Also have aclient
property that returns aInferenceClient
object to make prediction.From there, we can add methods to
HfApi
:cc @philschmid who extensively worked on this product. cc @julien-c @jeffboudier as well.
No clear roadmap yet, this is to trigger a discussion now that we have a reliable
InferenceClient
.The text was updated successfully, but these errors were encountered: