Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Tag Listing #537

Merged
merged 26 commits into from
Dec 20, 2021
Merged

Introduce Tag Listing #537

merged 26 commits into from
Dec 20, 2021

Conversation

muellerzr
Copy link
Contributor

@muellerzr muellerzr commented Dec 14, 2021

Still a draft

This PR introduces the ability to fetch all available tags for models or datasets, and returns them as a nested namespace object.

Rather than taking the approach of #263, where we introduce a few Enum classes, I'm opting to copy fastcore's AttrDict class, which allows us to write nested objects from dictionaries quickly, allowing a user to perform both x.y.z and x[y][z]. The benefit here is it also comes with tab completion.

The repr for this class will show available attributes they can search by.

For a quick example, here's what this could look like:
tags = api.get_model_tags()

If we just look at the repr of tags it looks like so:

Available Attributes:
 * benchmark
 * language_creators
 * languages
 * licenses
 * multilinguality
 * size_categories
 * task_categories
 * task_ids

We can then look in tags.benchmark to see:

Available Attributes:
 * raft
 * superb
 * test

Before finally we look in tags.benchmark.raft to see:

benchmark:raft

Allowing a friendly user interface, without them having to remember benchmark:raft

Testing still needs to be done, but want your opinion on this introduction @julien-c @nateraw @LysandreJik

@muellerzr
Copy link
Contributor Author

muellerzr commented Dec 14, 2021

We could then use this AttrDict anywhere we want from the api calls as well later on, so it wouldn't necessarily be just for this thing.

(We could also call this AttributeDictionary so that it's not shorthanded)

@muellerzr
Copy link
Contributor Author

Decided to go ahead with AttributeDictionary, and added some tests in.

One small tidbit of feedback I'd like is since testing both the API function and the ModelTags itself will look exactly the same, should we duplicate them? Or just include them in one vs the other.

@muellerzr muellerzr marked this pull request as ready for review December 15, 2021 20:12
@julien-c
Copy link
Member

julien-c commented Dec 16, 2021

I wouldn't duplicate tests given that the test suite is already super slow. (we should aim to make it faster)

@muellerzr
Copy link
Contributor Author

Makes sense. (Refactoring/making them faster is also on my radar). Will look at why the tests don't seem to like being run through ghactions, they pass locally for me which is problematic

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I like the idea and implementation! As you know, I'm particularly fond of the autocompletion it offers which makes it easy to never leave the IDE/console/notebook and still know the exact wording of the different tags.

There's a small issue with it right now, however:

>>> tags.license
Available Attributes:
 * ???
 * academicfreelicensev3.0
 * agpl_3.0
 * apache2.0
 * apache_2
 * apache_2.0
 * bsd_3
 * bsd_3_clause
 * cc
 * cc0.0
 * cc0_1.0
 * cc_by_4.0
 * cc_by_nc_4.0
 * cc_by_nc_nd_4.0
 * cc_by_nc_sa
 * cc_by_nc_sa_3.0
 * cc_by_nc_sa_4.0
 * cc_by_sa_4.0
 * gpl
 * gpl_2.0
 * gpl_3.0
 * lgpl_lr
 * mit
 * mitlicense
 * pddl

>>> tags.license.cc_by_4.0
  File "<input>", line 1
    tags.license.cc_by_4.0
                         ^
SyntaxError: invalid syntax

Overall super excited about this!

src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/utils/tags.py Outdated Show resolved Hide resolved
tests/test_tags.py Show resolved Hide resolved
tests/test_tags.py Outdated Show resolved Hide resolved
@muellerzr
Copy link
Contributor Author

muellerzr commented Dec 16, 2021

@osanseviero would like your review on the endpoints.md + README please :)

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I'm also very excited about this and I can already see cool things we could have apart from the search api. 🔥

Leaving some notes while playing around (not action items here, just thoughts as they come)

  • I'm not a fan of multilinguality tag since it's misleading (internal issue here). On the other hand this is consistent with the UI, so it seems fine as is!

  • I've always wondered which values exist for language_creators, so it's great to have this!

  • These APIs will really be useful for explorations as well (cc @severo)

  • Is tags.size_categories working as intended? The _repr returns this

Available Attributes:
 * n<1K
 * n>1M
 * unknown

but doing tags.size_categories.keys() gives me many other options, so something is off. My guess is because if not key[0].isdigit() here

  • Should we sort alphabetically the output? For example, when doing tags.library having this sorted might be more readable for our users.
Available Attributes:
 * AdapterTransformers
 * Asteroid
 * ESPnet
 * Flair
 * JAX
 * Joblib
 * Keras
 * ONNX
 * PyTorch
 * Pyannote
 * Rust
 * Scikit_learn
 * SentenceTransformers
 * Stanza
 * TFLite
 * TensorBoard
 * TensorFlow
 * TensorFlowTTS
 * Timm
 * Transformers
 * allennlp
 * fastText
 * fastai
 * spaCy
 * speechbrain

docs/hub/endpoints.md Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/utils/tags.py Outdated Show resolved Hide resolved
src/huggingface_hub/utils/tags.py Show resolved Hide resolved
@muellerzr
Copy link
Contributor Author

Today I need to make two more changes and then it will be good to go, these are things I found through further toying last night:

  • If there are any special characters at all we don't show it as an attribute, not just if it starts with a number. Since we can't do a.2<b, as python would read it as a.2 < b
  • Figure out what tests are failing

@muellerzr
Copy link
Contributor Author

muellerzr commented Dec 17, 2021

@osanseviero once tests pass, we should be good to go I believe :) (in CI/CD)

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! 🚀 🚀

tests/test_tags.py Outdated Show resolved Hide resolved
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! How long does the test suite take to run?

@muellerzr
Copy link
Contributor Author

Perfect! It only takes 1.7 seconds to run it

@muellerzr muellerzr merged commit 3d814e0 into huggingface:main Dec 20, 2021
@LysandreJik LysandreJik mentioned this pull request Dec 21, 2021
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants