Add list_deployed_models to inference client #1622

martinbrose · 2023-08-24T01:26:54Z

Add list_deployed_models to HuggingFace🤗 Hub

References #1557

Key Features

Modifies _client.py to add list_deployed_models functionality. Provides a list of model_id with the framework for a list of tasks provided.

HuggingFaceDocBuilderDev · 2023-09-07T16:00:31Z

The documentation is not available anymore as the PR was closed or merged.

Wauplin

Hey @martinbrose, thanks for working on this! I started to work locally on it and pushed my changes. Sorry again for doing it this way, I wanted to get it done ASAP for the next release and did not expect to change much things at first. In the end, I ended up changing a bit the API (I should have being more careful in the initial issue description).

Here are the changes:

I made a list of "main" frameworks that is a subset of all frameworks. By default we will only check those 4. This will improve usability ("only" 4 HTTP calls) and still covers 98% of the models.
The user can now list deployed models for the 4 main frameworks (transformers, diffusers, sentence-transformers and text-generation-inference) which is the case by default. Otherwise they can specify "all" (will trigger ~30 calls), or a single framework (e.g. "text-generation-inference") depending on the use case. This should help discoverability while being easy to use.
There is no need to filter by tasks inside the method. Filtering by framework is useful since it reduces the number of HTTP calls, hence improving the user experience. In the case of filtering by tasks, we would be doing the same number of calls but return only a subset of the output which is not really improving UX (will take exactly the same amount of time as if the users filter themselves the output).
handle the case of feature-extraction and sentence-similarity apart. Those 2 tasks are the only 1 that can work on the same deployed model as long as it's a sentence-transformers model. The API only shows 1 task per model but we tweak the output to show them in the two tasks.
I simplified a bit the output to optimize for readability (at the cost of less information). Output now looks like this: {'text-generation': ['bigcode/starcoder', 'meta-llama/Llama-2-70b-chat-hf', ...], ...} (keys are tasks, values are a sorted list of models).
I adapted the docstring, examples and tests accordingly.
(nit) (I also adapted the asyncio version so that HTTP calls are made concurrently to save up some time)

@martinbrose Feel free to let me know if some parts are unclear or if you have any feedback to make :)

src/huggingface_hub/inference/_client.py

osanseviero

Overall, LGTM. Thanks a lot! One concern I have is that I see text-generation-inference here. Technically, it's not a framework; we add a tag. I can see how having it could be useful, as users can then easily find models deployed with super fast inference, but I'm not 100% convinced about it.

cc @SBrandeis @julien-c in case you have other thoughts

osanseviero · 2023-09-07T20:50:14Z

src/huggingface_hub/constants.py

+MAIN_INFERENCE_API_FRAMEWORKS = [
+    "diffusers",
+    "sentence-transformers",
+    "text-generation-inference",


text-generation-inference is not a framework. It's a other type of tag, and is always transformers models and based in the architecture. See internal PR https://github.com/huggingface/moon-landing/pull/6894

I consider it as a framework because that's how the InferenceAPI route call it: https://api-inference.huggingface.co/framework/text-generation-inference . I don't think it's worth changing the naming here (arguably, TGI is not a library like transformers but still a framework to power inference on the server isn't it?).

osanseviero · 2023-09-07T20:58:26Z

src/huggingface_hub/constants.py

+# List frameworks that are handled by the InferenceAPI service. Useful to scan endpoints and check which models are
+# deployed and running. Since 95% of the models are using the top 4 frameworks listed below, we scan only those by
+# default. We still keep the full list of supported frameworks in case we want to scan all of them.
+MAIN_INFERENCE_API_FRAMEWORKS = [


Current distribution of libraries (a bit outdated but good proxy)

transformers 108k repos

SB3 8.6k repos (no API)

diffusers 8k repos

ml-agents 4k repos (no API)

ST 3k repos

cleanRL 1.5k repos (no API)

timm 1.2k repos (we have API, but quite limited)

I think going with transformers, ST and diffusers support is good for now!

cc @LysandreJik and @julien-c for vis

I think we have a confusion between what we consider a "framework" on the Hub and what is considered as "framework" under the hood for InferenceAPI. As I see it, list_deployed_models should be a helper to help discoverability of models already deployed so that the user don't have to wait for the model to load. I don't think we should choose the MAIN_INFERENCE_API_FRAMEWORKS list based on the Hub distribution but rather on the InferenceAPI distribution.

I know it is only a proxy but if I check every possible inference framework currently deployed, I get this:

diffusers 112 sentence-transformers 13 text-generation-inference 20 transformers 298 adapter-transformers 0 allennlp 0 asteroid 1 bertopic 0 doctr 0 espnet 2 fairseq 4 fastai 0 fasttext 0 flair 2 generic 2 k2 0 keras 0 mindspore 0 nemo 1 open_clip 0 paddlenlp 0 peft 0 pyannote-audio 1 sklearn 0 spacy 0 span-marker 0 speechbrain 3 stanza 0 timm 1

Which is why I chose those 4 frameworks. Also AFAIK, TGI models deployed on InferenceAPI is in fact a curated list of models that we consider as worthy to run which makes it even more important to list by default IMO (since users cannot spin-up LLMs by themselves the same way as any small-enough diffusers/transformers model)

Discussed offline.

The current solution is not ideal (might lead to confusion between Hub frameworks vs InferenceAPI frameworks) but hopefully it shouldn't be too bad as we (I) expect most users to not use the frameworks parameters and just take the default output. I added two Tip sections in the docstring to remind that list_deployed_models and get_model_status are meant to be complementary (the first one for discoverability, the second one for users that know what they want).

src/huggingface_hub/inference/_client.py

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

codecov · 2023-09-08T07:59:47Z

Codecov Report

Patch coverage: 88.23% and project coverage change: -0.36% ⚠️

Comparison is base (a86cb6f) 82.06% compared to head (0511699) 81.71%.
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1622      +/-   ##
==========================================
- Coverage   82.06%   81.71%   -0.36%     
==========================================
  Files          62       62              
  Lines        7065     7114      +49     
==========================================
+ Hits         5798     5813      +15     
- Misses       1267     1301      +34

Files Changed	Coverage Δ
...gingface_hub/inference/_generated/_async_client.py	`56.73% <84.61%> (+2.65%)`	⬆️
src/huggingface_hub/inference/_client.py	`81.36% <91.30%> (+0.87%)`	⬆️
src/huggingface_hub/constants.py	`98.21% <100.00%> (+0.06%)`	⬆️

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.

📢 Have feedback on the report? Share it here.

martinbrose and others added 10 commits August 24, 2023 02:21

Add list_deployed_models to inference client

8e074ba

Address code style

f097336

Add ModelInfo dataclass and filter for tasks

f1b030f

Address code quality and add token logic

62f83a1

Add tests

d9d3eae

Merge branch 'main' into 1557-InferenceClient-list_deployed_models

61a6b35

Merge branch 'main' into 1557-InferenceClient-list_deployed_models

d6432c0

update list_deployed_models

476f086

add tests

8a400f9

remove unused code

e477547

Wauplin approved these changes Sep 7, 2023

View reviewed changes

comment

887555f

Wauplin reviewed Sep 7, 2023

View reviewed changes

src/huggingface_hub/inference/_client.py Show resolved Hide resolved

osanseviero reviewed Sep 7, 2023

View reviewed changes

Apply suggestions from code review

26ab042

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

Wauplin added 2 commits September 8, 2023 10:05

make style

8553ca5

add tips

0511699

Wauplin approved these changes Sep 8, 2023

View reviewed changes

Wauplin merged commit 681815c into huggingface:main Sep 8, 2023

Wauplin mentioned this pull request Sep 8, 2023

Add list_deployed_models to list models deployed on InferenceAPI #1557

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add list_deployed_models to inference client #1622

Add list_deployed_models to inference client #1622

martinbrose commented Aug 24, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 7, 2023 •

edited

Loading

Wauplin left a comment

osanseviero left a comment

osanseviero Sep 7, 2023

Wauplin Sep 8, 2023

osanseviero Sep 7, 2023

Wauplin Sep 8, 2023 •

edited

Loading

Wauplin Sep 8, 2023 •

edited

Loading

codecov bot commented Sep 8, 2023 •

edited

Loading

Add list_deployed_models to inference client #1622

Add list_deployed_models to inference client #1622

Conversation

martinbrose commented Aug 24, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Sep 7, 2023 • edited Loading

Wauplin left a comment

Choose a reason for hiding this comment

osanseviero left a comment

Choose a reason for hiding this comment

osanseviero Sep 7, 2023

Choose a reason for hiding this comment

Wauplin Sep 8, 2023

Choose a reason for hiding this comment

osanseviero Sep 7, 2023

Choose a reason for hiding this comment

Wauplin Sep 8, 2023 • edited Loading

Choose a reason for hiding this comment

Wauplin Sep 8, 2023 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Sep 8, 2023 • edited Loading

Codecov Report

martinbrose commented Aug 24, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 7, 2023 •

edited

Loading

Wauplin Sep 8, 2023 •

edited

Loading

Wauplin Sep 8, 2023 •

edited

Loading

codecov bot commented Sep 8, 2023 •

edited

Loading