Introduce Tag Listing #537

muellerzr · 2021-12-14T19:42:48Z

~~Still a draft~~

This PR introduces the ability to fetch all available tags for models or datasets, and returns them as a nested namespace object.

Rather than taking the approach of #263, where we introduce a few Enum classes, I'm opting to copy fastcore's AttrDict class, which allows us to write nested objects from dictionaries quickly, allowing a user to perform both x.y.z and x[y][z]. The benefit here is it also comes with tab completion.

The repr for this class will show available attributes they can search by.

For a quick example, here's what this could look like:
tags = api.get_model_tags()

If we just look at the repr of tags it looks like so:

Available Attributes:
 * benchmark
 * language_creators
 * languages
 * licenses
 * multilinguality
 * size_categories
 * task_categories
 * task_ids

We can then look in tags.benchmark to see:

Available Attributes:
 * raft
 * superb
 * test

Before finally we look in tags.benchmark.raft to see:

benchmark:raft

Allowing a friendly user interface, without them having to remember benchmark:raft

Testing still needs to be done, but want your opinion on this introduction @julien-c @nateraw @LysandreJik

muellerzr · 2021-12-14T19:44:36Z

We could then use this AttrDict anywhere we want from the api calls as well later on, so it wouldn't necessarily be just for this thing.

(We could also call this AttributeDictionary so that it's not shorthanded)

muellerzr · 2021-12-15T20:12:49Z

Decided to go ahead with AttributeDictionary, and added some tests in.

One small tidbit of feedback I'd like is since testing both the API function and the ModelTags itself will look exactly the same, should we duplicate them? Or just include them in one vs the other.

julien-c · 2021-12-16T12:11:40Z

I wouldn't duplicate tests given that the test suite is already super slow. (we should aim to make it faster)

muellerzr · 2021-12-16T13:55:06Z

Makes sense. (Refactoring/making them faster is also on my radar). Will look at why the tests don't seem to like being run through ghactions, they pass locally for me which is problematic

LysandreJik

Great, I like the idea and implementation! As you know, I'm particularly fond of the autocompletion it offers which makes it easy to never leave the IDE/console/notebook and still know the exact wording of the different tags.

There's a small issue with it right now, however:

>>> tags.license
Available Attributes:
 * ???
 * academicfreelicensev3.0
 * agpl_3.0
 * apache2.0
 * apache_2
 * apache_2.0
 * bsd_3
 * bsd_3_clause
 * cc
 * cc0.0
 * cc0_1.0
 * cc_by_4.0
 * cc_by_nc_4.0
 * cc_by_nc_nd_4.0
 * cc_by_nc_sa
 * cc_by_nc_sa_3.0
 * cc_by_nc_sa_4.0
 * cc_by_sa_4.0
 * gpl
 * gpl_2.0
 * gpl_3.0
 * lgpl_lr
 * mit
 * mitlicense
 * pddl

>>> tags.license.cc_by_4.0
  File "<input>", line 1
    tags.license.cc_by_4.0
                         ^
SyntaxError: invalid syntax

Overall super excited about this!

src/huggingface_hub/hf_api.py

src/huggingface_hub/utils/tags.py

tests/test_tags.py

muellerzr · 2021-12-16T16:41:18Z

@osanseviero would like your review on the endpoints.md + README please :)

osanseviero

Great work! I'm also very excited about this and I can already see cool things we could have apart from the search api. 🔥

Leaving some notes while playing around (not action items here, just thoughts as they come)

I'm not a fan of multilinguality tag since it's misleading (internal issue here). On the other hand this is consistent with the UI, so it seems fine as is!
I've always wondered which values exist for language_creators, so it's great to have this!
These APIs will really be useful for explorations as well (cc @severo)
Is tags.size_categories working as intended? The _repr returns this

Available Attributes:
 * n<1K
 * n>1M
 * unknown

but doing tags.size_categories.keys() gives me many other options, so something is off. My guess is because if not key[0].isdigit() here

Should we sort alphabetically the output? For example, when doing tags.library having this sorted might be more readable for our users.

Available Attributes:
 * AdapterTransformers
 * Asteroid
 * ESPnet
 * Flair
 * JAX
 * Joblib
 * Keras
 * ONNX
 * PyTorch
 * Pyannote
 * Rust
 * Scikit_learn
 * SentenceTransformers
 * Stanza
 * TFLite
 * TensorBoard
 * TensorFlow
 * TensorFlowTTS
 * Timm
 * Transformers
 * allennlp
 * fastText
 * fastai
 * spaCy
 * speechbrain

src/huggingface_hub/utils/tags.py

docs/hub/endpoints.md

src/huggingface_hub/hf_api.py

src/huggingface_hub/utils/tags.py

Co-authored-by: Omar Sanseviero <osanseviero@users.noreply.github.com>

…_hub into tag-getter

muellerzr · 2021-12-17T12:28:05Z

Today I need to make two more changes and then it will be good to go, these are things I found through further toying last night:

If there are any special characters at all we don't show it as an attribute, not just if it starts with a number. Since we can't do a.2<b, as python would read it as a.2 < b
Figure out what tests are failing

muellerzr · 2021-12-17T14:33:49Z

@osanseviero once tests pass, we should be good to go I believe :) (in CI/CD)

osanseviero

This looks great! 🚀 🚀

tests/test_tags.py

LysandreJik

This looks good to me! How long does the test suite take to run?

muellerzr · 2021-12-20T13:54:58Z

Perfect! It only takes 1.7 seconds to run it

muellerzr added 2 commits December 14, 2021 14:30

Initial attempt at a tag getter

3117b53

Clean + format

e3e4033

muellerzr added 8 commits December 15, 2021 11:28

Add delattr similar to getattr, start tests

e8be6dc

Add some more tests (more to come)

2073984

Increase verbosity

1e1fca0

Add quality

ab5083d

Improve docstring

c6f7598

Style

495987c

Finish testing

b0a3abe

Clean + docstring verbosity

7381654

muellerzr marked this pull request as ready for review December 15, 2021 20:12

Fix up tests

db8a799

LysandreJik reviewed Dec 16, 2021

View reviewed changes

src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved

src/huggingface_hub/utils/tags.py Outdated Show resolved Hide resolved

tests/test_tags.py Show resolved Hide resolved

tests/test_tags.py Outdated Show resolved Hide resolved

muellerzr added 3 commits December 16, 2021 11:15

Adjust for periods and numbers, improve tests

436c339

Fixup documentation + final changes

ba066b1

Table error

6ce7d63

muellerzr requested a review from osanseviero December 16, 2021 16:40

osanseviero reviewed Dec 16, 2021

View reviewed changes

src/huggingface_hub/utils/tags.py Outdated Show resolved Hide resolved

osanseviero reviewed Dec 16, 2021

View reviewed changes

docs/hub/endpoints.md Outdated Show resolved Hide resolved

src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved

src/huggingface_hub/utils/tags.py Outdated Show resolved Hide resolved

src/huggingface_hub/utils/tags.py Show resolved Hide resolved

muellerzr and others added 5 commits December 16, 2021 18:02

Update docs/hub/endpoints.md

5d7e2e3

Co-authored-by: Omar Sanseviero <osanseviero@users.noreply.github.com>

Expand docstrings, fix edgecase

77017bd

Merge branch 'tag-getter' of https://github.com/muellerzr/huggingface…

861309d

…_hub into tag-getter

Style

996b7e6

Fix tests

811aeb7

Fix tests, finalize implementation

9b81249

muellerzr added 5 commits December 17, 2021 09:58

Bring back some tests

660ffbb

Update some_text.txt

a19332a

Update some_text.txt

3e7c11a

Update some_text.txt

00a73ab

Fix

e342973

muellerzr requested a review from osanseviero December 17, 2021 15:22

osanseviero approved these changes Dec 17, 2021

View reviewed changes

tests/test_tags.py Outdated Show resolved Hide resolved

Test readability

c901a1f

muellerzr requested a review from LysandreJik December 17, 2021 19:45

LysandreJik approved these changes Dec 20, 2021

View reviewed changes

muellerzr merged commit 3d814e0 into huggingface:main Dec 20, 2021

LysandreJik mentioned this pull request Dec 21, 2021

Implement a Model Filter class #553

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Tag Listing #537

Introduce Tag Listing #537

muellerzr commented Dec 14, 2021 •

edited

Loading

muellerzr commented Dec 14, 2021 •

edited

Loading

muellerzr commented Dec 15, 2021

julien-c commented Dec 16, 2021 •

edited

Loading

muellerzr commented Dec 16, 2021

LysandreJik left a comment

muellerzr commented Dec 16, 2021 •

edited

Loading

osanseviero left a comment

muellerzr commented Dec 17, 2021

muellerzr commented Dec 17, 2021 •

edited

Loading

osanseviero left a comment

LysandreJik left a comment

muellerzr commented Dec 20, 2021

Introduce Tag Listing #537

Introduce Tag Listing #537

Conversation

muellerzr commented Dec 14, 2021 • edited Loading

muellerzr commented Dec 14, 2021 • edited Loading

muellerzr commented Dec 15, 2021

julien-c commented Dec 16, 2021 • edited Loading

muellerzr commented Dec 16, 2021

LysandreJik left a comment

Choose a reason for hiding this comment

muellerzr commented Dec 16, 2021 • edited Loading

osanseviero left a comment

Choose a reason for hiding this comment

muellerzr commented Dec 17, 2021

muellerzr commented Dec 17, 2021 • edited Loading

osanseviero left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

muellerzr commented Dec 20, 2021

muellerzr commented Dec 14, 2021 •

edited

Loading

muellerzr commented Dec 14, 2021 •

edited

Loading

julien-c commented Dec 16, 2021 •

edited

Loading

muellerzr commented Dec 16, 2021 •

edited

Loading

muellerzr commented Dec 17, 2021 •

edited

Loading