Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add list_deployed_models to inference client #1622

Merged

Conversation

martinbrose
Copy link
Contributor

@martinbrose martinbrose commented Aug 24, 2023

Add list_deployed_models to HuggingFace🤗 Hub

References #1557

Key Features

  • Modifies _client.py to add list_deployed_models functionality. Provides a list of model_id with the framework for a list of tasks provided.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 7, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @martinbrose, thanks for working on this! I started to work locally on it and pushed my changes. Sorry again for doing it this way, I wanted to get it done ASAP for the next release and did not expect to change much things at first. In the end, I ended up changing a bit the API (I should have being more careful in the initial issue description).

Here are the changes:

  • I made a list of "main" frameworks that is a subset of all frameworks. By default we will only check those 4. This will improve usability ("only" 4 HTTP calls) and still covers 98% of the models.
  • The user can now list deployed models for the 4 main frameworks (transformers, diffusers, sentence-transformers and text-generation-inference) which is the case by default. Otherwise they can specify "all" (will trigger ~30 calls), or a single framework (e.g. "text-generation-inference") depending on the use case. This should help discoverability while being easy to use.
  • There is no need to filter by tasks inside the method. Filtering by framework is useful since it reduces the number of HTTP calls, hence improving the user experience. In the case of filtering by tasks, we would be doing the same number of calls but return only a subset of the output which is not really improving UX (will take exactly the same amount of time as if the users filter themselves the output).
  • handle the case of feature-extraction and sentence-similarity apart. Those 2 tasks are the only 1 that can work on the same deployed model as long as it's a sentence-transformers model. The API only shows 1 task per model but we tweak the output to show them in the two tasks.
  • I simplified a bit the output to optimize for readability (at the cost of less information). Output now looks like this: {'text-generation': ['bigcode/starcoder', 'meta-llama/Llama-2-70b-chat-hf', ...], ...} (keys are tasks, values are a sorted list of models).
  • I adapted the docstring, examples and tests accordingly.
  • (nit) (I also adapted the asyncio version so that HTTP calls are made concurrently to save up some time)

@martinbrose Feel free to let me know if some parts are unclear or if you have any feedback to make :)

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM. Thanks a lot! One concern I have is that I see text-generation-inference here. Technically, it's not a framework; we add a tag. I can see how having it could be useful, as users can then easily find models deployed with super fast inference, but I'm not 100% convinced about it.

cc @SBrandeis @julien-c in case you have other thoughts

MAIN_INFERENCE_API_FRAMEWORKS = [
"diffusers",
"sentence-transformers",
"text-generation-inference",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text-generation-inference is not a framework. It's a other type of tag, and is always transformers models and based in the architecture. See internal PR https://github.com/huggingface/moon-landing/pull/6894

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider it as a framework because that's how the InferenceAPI route call it: https://api-inference.huggingface.co/framework/text-generation-inference . I don't think it's worth changing the naming here (arguably, TGI is not a library like transformers but still a framework to power inference on the server isn't it?).

# List frameworks that are handled by the InferenceAPI service. Useful to scan endpoints and check which models are
# deployed and running. Since 95% of the models are using the top 4 frameworks listed below, we scan only those by
# default. We still keep the full list of supported frameworks in case we want to scan all of them.
MAIN_INFERENCE_API_FRAMEWORKS = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current distribution of libraries (a bit outdated but good proxy)

  • transformers 108k repos
  • SB3 8.6k repos (no API)
  • diffusers 8k repos
  • ml-agents 4k repos (no API)
  • ST 3k repos
  • cleanRL 1.5k repos (no API)
  • timm 1.2k repos (we have API, but quite limited)

I think going with transformers, ST and diffusers support is good for now!

cc @LysandreJik and @julien-c for vis

Copy link
Contributor

@Wauplin Wauplin Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a confusion between what we consider a "framework" on the Hub and what is considered as "framework" under the hood for InferenceAPI. As I see it, list_deployed_models should be a helper to help discoverability of models already deployed so that the user don't have to wait for the model to load. I don't think we should choose the MAIN_INFERENCE_API_FRAMEWORKS list based on the Hub distribution but rather on the InferenceAPI distribution.

I know it is only a proxy but if I check every possible inference framework currently deployed, I get this:

diffusers 112
sentence-transformers 13
text-generation-inference 20
transformers 298
adapter-transformers 0
allennlp 0
asteroid 1
bertopic 0
doctr 0
espnet 2
fairseq 4
fastai 0
fasttext 0
flair 2
generic 2
k2 0
keras 0
mindspore 0
nemo 1
open_clip 0
paddlenlp 0
peft 0
pyannote-audio 1
sklearn 0
spacy 0
span-marker 0
speechbrain 3
stanza 0
timm 1

Which is why I chose those 4 frameworks. Also AFAIK, TGI models deployed on InferenceAPI is in fact a curated list of models that we consider as worthy to run which makes it even more important to list by default IMO (since users cannot spin-up LLMs by themselves the same way as any small-enough diffusers/transformers model)

Copy link
Contributor

@Wauplin Wauplin Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline.

The current solution is not ideal (might lead to confusion between Hub frameworks vs InferenceAPI frameworks) but hopefully it shouldn't be too bad as we (I) expect most users to not use the frameworks parameters and just take the default output. I added two Tip sections in the docstring to remind that list_deployed_models and get_model_status are meant to be complementary (the first one for discoverability, the second one for users that know what they want).

src/huggingface_hub/inference/_client.py Show resolved Hide resolved
src/huggingface_hub/inference/_client.py Outdated Show resolved Hide resolved
src/huggingface_hub/inference/_client.py Outdated Show resolved Hide resolved
src/huggingface_hub/inference/_client.py Show resolved Hide resolved
src/huggingface_hub/inference/_client.py Show resolved Hide resolved
src/huggingface_hub/inference/_client.py Show resolved Hide resolved
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
@codecov
Copy link

codecov bot commented Sep 8, 2023

Codecov Report

Patch coverage: 88.23% and project coverage change: -0.36% ⚠️

Comparison is base (a86cb6f) 82.06% compared to head (0511699) 81.71%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1622      +/-   ##
==========================================
- Coverage   82.06%   81.71%   -0.36%     
==========================================
  Files          62       62              
  Lines        7065     7114      +49     
==========================================
+ Hits         5798     5813      +15     
- Misses       1267     1301      +34     
Files Changed Coverage Δ
...gingface_hub/inference/_generated/_async_client.py 56.73% <84.61%> (+2.65%) ⬆️
src/huggingface_hub/inference/_client.py 81.36% <91.30%> (+0.87%) ⬆️
src/huggingface_hub/constants.py 98.21% <100.00%> (+0.06%) ⬆️

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.

📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants