-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Tag Listing #537
Conversation
We could then use this (We could also call this |
Decided to go ahead with One small tidbit of feedback I'd like is since testing both the API function and the |
I wouldn't duplicate tests given that the test suite is already super slow. (we should aim to make it faster) |
Makes sense. (Refactoring/making them faster is also on my radar). Will look at why the tests don't seem to like being run through ghactions, they pass locally for me which is problematic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I like the idea and implementation! As you know, I'm particularly fond of the autocompletion it offers which makes it easy to never leave the IDE/console/notebook and still know the exact wording of the different tags.
There's a small issue with it right now, however:
>>> tags.license
Available Attributes:
* ???
* academicfreelicensev3.0
* agpl_3.0
* apache2.0
* apache_2
* apache_2.0
* bsd_3
* bsd_3_clause
* cc
* cc0.0
* cc0_1.0
* cc_by_4.0
* cc_by_nc_4.0
* cc_by_nc_nd_4.0
* cc_by_nc_sa
* cc_by_nc_sa_3.0
* cc_by_nc_sa_4.0
* cc_by_sa_4.0
* gpl
* gpl_2.0
* gpl_3.0
* lgpl_lr
* mit
* mitlicense
* pddl
>>> tags.license.cc_by_4.0
File "<input>", line 1
tags.license.cc_by_4.0
^
SyntaxError: invalid syntax
Overall super excited about this!
@osanseviero would like your review on the endpoints.md + README please :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I'm also very excited about this and I can already see cool things we could have apart from the search api. 🔥
Leaving some notes while playing around (not action items here, just thoughts as they come)
-
I'm not a fan of
multilinguality
tag since it's misleading (internal issue here). On the other hand this is consistent with the UI, so it seems fine as is! -
I've always wondered which values exist for language_creators, so it's great to have this!
-
These APIs will really be useful for explorations as well (cc @severo)
-
Is
tags.size_categories
working as intended? The_repr
returns this
Available Attributes:
* n<1K
* n>1M
* unknown
but doing tags.size_categories.keys()
gives me many other options, so something is off. My guess is because if not key[0].isdigit()
here
- Should we sort alphabetically the output? For example, when doing
tags.library
having this sorted might be more readable for our users.
Available Attributes:
* AdapterTransformers
* Asteroid
* ESPnet
* Flair
* JAX
* Joblib
* Keras
* ONNX
* PyTorch
* Pyannote
* Rust
* Scikit_learn
* SentenceTransformers
* Stanza
* TFLite
* TensorBoard
* TensorFlow
* TensorFlowTTS
* Timm
* Transformers
* allennlp
* fastText
* fastai
* spaCy
* speechbrain
Co-authored-by: Omar Sanseviero <osanseviero@users.noreply.github.com>
Today I need to make two more changes and then it will be good to go, these are things I found through further toying last night:
|
@osanseviero once tests pass, we should be good to go I believe :) (in CI/CD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! 🚀 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! How long does the test suite take to run?
Perfect! It only takes 1.7 seconds to run it |
Still a draftThis PR introduces the ability to fetch all available tags for models or datasets, and returns them as a nested namespace object.
Rather than taking the approach of #263, where we introduce a few Enum classes, I'm opting to copy fastcore's
AttrDict
class, which allows us to write nested objects from dictionaries quickly, allowing a user to perform bothx.y.z
andx[y][z]
. The benefit here is it also comes with tab completion.The
repr
for this class will show available attributes they can search by.For a quick example, here's what this could look like:
tags = api.get_model_tags()
If we just look at the repr of
tags
it looks like so:We can then look in
tags.benchmark
to see:Before finally we look in
tags.benchmark.raft
to see:Allowing a friendly user interface, without them having to remember
benchmark:raft
Testing still needs to be done, but want your opinion on this introduction @julien-c @nateraw @LysandreJik