Skip to content

Commit

Permalink
Removed GeneralTags, ModelTags and DatasetTags (#1761)
Browse files Browse the repository at this point in the history
* removed tags from endpoint tests

* removed tags from endpoints

* removed tags from hf_api

* removed tags from docstrings in endpoint_helpers

* removed tags from hf_api

* removed model search argument from test_hf_api

* removed ModelSearchArguments and DataSearchArguments

* removed DatasetSearchArguments and ModelSearchArguments

* removed DatasetSearchArguments and

* removed ModelSearchArguments and DatasetSearchArguments from the docs

* Revert "removed DatasetSearchArguments and"

This reverts commit ce6b91b.

* removed tags from __init__.py

* ran make style

* Removed ## How to explore filter options ? section

* Revert "removed tags from __init__.py"

This reverts commit ad1a31c.

Reverting removal get_dataset_tags and get_model_tags for comment 2

* Revert "removed DatasetSearchArguments and ModelSearchArguments"

This reverts commit fbf6dd0.

* Revert "removed tags from __init__.py"

This reverts commit ad1a31c.

* Revert "removed tags from hf_api"

This reverts commit 2cefee1.

* Revert "removed tags from hf_api"

This reverts commit dd3b8f1.

* Removed attribute dictionary from imports
and removed model search argument class

* Complete removed class AttributeDictionary(dict):

* Removed attribute dictionary tests

* Updating ModelTags and DatasetTags
so that they just return the raw dictionary

* Removed final DatasetTags import

* Removed 'ModelTags' import

* Ran make style

* fix: remove useless token (#1765)

* Retry on ConnectionError/ReadTimeout when streaming file from server (#1766)

* Retry on ConnectionError/ReadTimeout when streaming file from server

* add test

* fix testing utils

* Adding `InferenceClient.get_recommended_model` (#1770)

* Moved logger info to InferenceClient, so get_recommended_model function can bypass that

* Added get_recommended_model to InferenceClient

* Ran make style to generate the async client

* Added tests of get_recommended_model

* Update src/huggingface_hub/inference/_client.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Fixed ordering of logger info and _get_recommended_model, for model string to have been populated

* Removed _get_recommended_model private function, in favor of get_recommended_model in InferenceClient

* Fixed wording of ValueError to use 'model' not 'task'

* Ran make style for AsyncInferenceClient

---------

Co-authored-by: Lucain <lucainp@gmail.com>

* Fix document link for manage-cache (#1774)

* Fix document link for manage-cache

* Use redirects in _redirects.yml

* Update docs/source/en/package_reference/file_download.md

---------

Co-authored-by: Lucain <lucainp@gmail.com>

* Minor doc fixes (#1775)

* Don't use `api` in `list_repo_refs` example.

* Minor typo fssepc -> fsspec

* Use `.item_object_id` instead of `._id`

* Ran make style

---------

Co-authored-by: Remy <remy@huggingface.co>
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: James Braza <jamesbraza@gmail.com>
Co-authored-by: liuxueyang <liuxueyang457@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
  • Loading branch information
6 people authored Oct 26, 2023
1 parent ef48c7f commit 0a2503d
Show file tree
Hide file tree
Showing 8 changed files with 10 additions and 791 deletions.
159 changes: 1 addition & 158 deletions docs/source/de/guides/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,164 +60,7 @@ Zum Beispiel holt das folgende Beispiel die 5 am häufigsten heruntergeladenen D
```


## Wie erkundet man Filteroptionen?

Jetzt wissen Sie, wie Sie Ihre Liste von Modellen/Datensätzen/Räumen filtern können.
Das Problem könnte sein, dass Sie nicht genau wissen, wonach Sie suchen. Keine Sorge!
Wir bieten auch einige Hilfsprogramme an, mit denen Sie entdecken können, welche Argumente Sie in Ihrer Abfrage übergeben können.

[`ModelSearchArguments`] und [`DatasetSearchArguments`] sind geschachtelte Namespace-Objekte,
die **jede einzelne Option** auf dem Hub haben und die zurückgeben, was an `filter` übergeben werden sollte.
Das Beste von allem ist: Es hat Tab-Vervollständigung 🎊.

```python
>>> from huggingface_hub import ModelSearchArguments, DatasetSearchArguments

>>> model_args = ModelSearchArguments()
>>> dataset_args = DatasetSearchArguments()
```

<Tip warning={true}>

Bevor Sie weitermachen, beachten Sie bitte, dass [`ModelSearchArguments`] und [`DatasetSearchArguments`]
veraltete Hilfsprogramme sind, die nur zu Erkundungszwecken gedacht sind.
Ihre Initialisierung erfordert das Auflisten aller Modelle und Datensätze auf dem Hub, was sie zunehmend langsamer macht,
je mehr Repos auf dem Hub vorhanden sind. Für produktionsbereiten Code sollten Sie in Erwägung ziehen,
rohe Zeichenketten (raw strings) zu übergeben, wenn Sie eine gefilterte Suche auf dem Hub durchführen.

</Tip>

Sehen wir uns nun an, was in `model_args` verfügbar ist, indem wir seine Ausgabe überprüfen:

```python
>>> model_args
Available Attributes or Keys:
* author
* dataset
* language
* library
* license
* model_name
* pipeline_tag
```

Es stehen Ihnen eine Vielzahl von Attributen oder Schlüsseln zur Verfügung.
Dies liegt daran, dass es sowohl ein Objekt als auch ein Wörterbuch ist.
Daher können Sie entweder `model_args["author"]` oder `model_args.author` verwenden.

Das erste Kriterium besteht darin, alle PyTorch-Modelle zu erhalten.
Dies wäre unter dem Attribut `library` zu finden, schauen wir also, ob es da ist:

```python
>>> model_args.library
Available Attributes or Keys:
* AdapterTransformers
* Asteroid
* ESPnet
* Fairseq
* Flair
* JAX
* Joblib
* Keras
* ONNX
* PyTorch
* Rust
* Scikit_learn
* SentenceTransformers
* Stable_Baselines3 (Key only)
* Stanza
* TFLite
* TensorBoard
* TensorFlow
* TensorFlowTTS
* Timm
* Transformers
* allenNLP
* fastText
* fastai
* pyannote_audio
* spaCy
* speechbrain
```

Es ist da! Der Name PyTorch ist vorhanden, daher müssen Sie `model_args.library.PyTorch` verwenden:

```python
>>> model_args.library.PyTorch
'pytorch'
```

Im Folgenden finden Sie eine Animation, die den Vorgang zur Suche nach den Anforderungen Textklassifizierung (`Text Classification`) and `glue` wiederholt:

![Animation exploring `model_args.pipeline_tag`](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/search_text_classification.gif)

![Animation exploring `model_args.dataset`](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/search_glue.gif)

Jetzt, da alle Teile vorhanden sind, besteht der letzte Schritt darin, sie alle für etwas zu kombinieren,
das die API über die Klassen [`ModelFilter`] und [`DatasetFilter`] verwenden kann (d.h. Zeichenketten / strings).


```python
>>> from huggingface_hub import ModelFilter, DatasetFilter

>>> filt = ModelFilter(
... task=model_args.pipeline_tag.TextClassification,
... trained_dataset=dataset_args.dataset_name.glue,
... library=model_args.library.PyTorch
... )
>>> api.list_models(filter=filt)[0]
ModelInfo: {
modelId: Jiva/xlm-roberta-large-it-mnli
sha: c6e64469ec4aa17fedbd1b2522256f90a90b5b86
lastModified: 2021-12-10T14:56:38.000Z
tags: ['pytorch', 'xlm-roberta', 'text-classification', 'it', 'dataset:multi_nli', 'dataset:glue', 'arxiv:1911.02116', 'transformers', 'tensorflow', 'license:mit', 'zero-shot-classification']
pipeline_tag: zero-shot-classification
siblings: [ModelFile(rfilename='.gitattributes'), ModelFile(rfilename='README.md'), ModelFile(rfilename='config.json'), ModelFile(rfilename='pytorch_model.bin'), ModelFile(rfilename='sentencepiece.bpe.model'), ModelFile(rfilename='special_tokens_map.json'), ModelFile(rfilename='tokenizer.json'), ModelFile(rfilename='tokenizer_config.json')]
config: None
id: Jiva/xlm-roberta-large-it-mnli
private: False
downloads: 11061
library_name: transformers
likes: 1
}
```

Wie Sie sehen können, wurden die Modelle gefunden, die allen Kriterien entsprechen. Sie können es sogar noch weiter bringen,
indem Sie ein Array für jeden der vorherigen Parameter übergeben.
Zum Beispiel, schauen wir uns dieselbe Konfiguration an, aber schließen auch `TensorFlow` in den Filter ein:

```python
>>> filt = ModelFilter(
... task=model_args.pipeline_tag.TextClassification,
... library=[model_args.library.PyTorch, model_args.library.TensorFlow]
... )
>>> api.list_models(filter=filt)[0]
ModelInfo: {
modelId: distilbert-base-uncased-finetuned-sst-2-english
sha: ada5cc01a40ea664f0a490d0b5f88c97ab460470
lastModified: 2022-03-22T19:47:08.000Z
tags: ['pytorch', 'tf', 'rust', 'distilbert', 'text-classification', 'en', 'dataset:sst-2', 'transformers', 'license:apache-2.0', 'infinity_compatible']
pipeline_tag: text-classification
siblings: [ModelFile(rfilename='.gitattributes'), ModelFile(rfilename='README.md'), ModelFile(rfilename='config.json'), ModelFile(rfilename='map.jpeg'), ModelFile(rfilename='pytorch_model.bin'), ModelFile(rfilename='rust_model.ot'), ModelFile(rfilename='tf_model.h5'), ModelFile(rfilename='tokenizer_config.json'), ModelFile(rfilename='vocab.txt')]
config: None
id: distilbert-base-uncased-finetuned-sst-2-english
private: False
downloads: 3917525
library_name: transformers
likes: 49
}
```

Diese Abfrage entspricht streng:

```py
>>> filt = ModelFilter(
... task="text-classification",
... library=["pytorch", "tensorflow"],
... )
```

Hier war [`ModelSearchArguments`] ein Helfer, um die auf dem Hub verfügbaren Optionen zu erkunden.
Es ist jedoch keine Voraussetzung für eine Suche. Eine andere Möglichkeit, dies zu tun,
Eine andere Möglichkeit, dies zu tun,
besteht darin, die [Modelle](https://huggingface.co/models) und [Datensätze](https://huggingface.co/datasets) Seiten
in Ihrem Browser zu besuchen, nach einigen Parametern zu suchen und die Werte in der URL anzusehen.
159 changes: 1 addition & 158 deletions docs/source/en/guides/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,164 +60,7 @@ the following example fetches the top 5 most downloaded datasets on the Hub:
```


## How to explore filter options ?

Now you know how to filter your list of models/datasets/spaces. The problem you might
have is that you don't know exactly what you are looking for. No worries! We also provide
some helpers that allows you to discover what arguments can be passed in your query.

[`ModelSearchArguments`] and [`DatasetSearchArguments`] are nested namespace objects that
have **every single option** available on the Hub and that will return what should be passed
to `filter`. The best of all is: it has tab completion 🎊 .

```python
>>> from huggingface_hub import ModelSearchArguments, DatasetSearchArguments

>>> model_args = ModelSearchArguments()
>>> dataset_args = DatasetSearchArguments()
```

<Tip warning={true}>

Before continuing, please we aware that [`ModelSearchArguments`] and [`DatasetSearchArguments`]
are legacy helpers meant for exploratory purposes only. Their initialization require listing
all models and datasets on the Hub which makes them increasingly slower as the number of repos
on the Hub increases. For some production-ready code, consider passing raw strings when making
a filtered search on the Hub.

</Tip>

Now, let's check what is available in `model_args` by checking it's output, you will find:

```python
>>> model_args
Available Attributes or Keys:
* author
* dataset
* language
* library
* license
* model_name
* pipeline_tag
```

It has a variety of attributes or keys available to you. This is because it is both an object
and a dictionary, so you can either do `model_args["author"]` or `model_args.author`.

The first criteria is getting all PyTorch models. This would be found under the `library` attribute, so let's see if it is there:

```python
>>> model_args.library
Available Attributes or Keys:
* AdapterTransformers
* Asteroid
* ESPnet
* Fairseq
* Flair
* JAX
* Joblib
* Keras
* ONNX
* PyTorch
* Rust
* Scikit_learn
* SentenceTransformers
* Stable_Baselines3 (Key only)
* Stanza
* TFLite
* TensorBoard
* TensorFlow
* TensorFlowTTS
* Timm
* Transformers
* allenNLP
* fastText
* fastai
* pyannote_audio
* spaCy
* speechbrain
```

It is! The `PyTorch` name is there, so you'll need to use `model_args.library.PyTorch`:

```python
>>> model_args.library.PyTorch
'pytorch'
```

Below is an animation repeating the process for finding both the `Text Classification` and `glue` requirements:

![Animation exploring `model_args.pipeline_tag`](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/search_text_classification.gif)

![Animation exploring `model_args.dataset`](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/search_glue.gif)

Now that all the pieces are there, the last step is to combine them all for something the
API can use through the [`ModelFilter`] and [`DatasetFilter`] classes (i.e. strings).


```python
>>> from huggingface_hub import ModelFilter, DatasetFilter

>>> filt = ModelFilter(
... task=model_args.pipeline_tag.TextClassification,
... trained_dataset=dataset_args.dataset_name.glue,
... library=model_args.library.PyTorch
... )
>>> api.list_models(filter=filt)[0]
ModelInfo: {
modelId: Jiva/xlm-roberta-large-it-mnli
sha: c6e64469ec4aa17fedbd1b2522256f90a90b5b86
lastModified: 2021-12-10T14:56:38.000Z
tags: ['pytorch', 'xlm-roberta', 'text-classification', 'it', 'dataset:multi_nli', 'dataset:glue', 'arxiv:1911.02116', 'transformers', 'tensorflow', 'license:mit', 'zero-shot-classification']
pipeline_tag: zero-shot-classification
siblings: [ModelFile(rfilename='.gitattributes'), ModelFile(rfilename='README.md'), ModelFile(rfilename='config.json'), ModelFile(rfilename='pytorch_model.bin'), ModelFile(rfilename='sentencepiece.bpe.model'), ModelFile(rfilename='special_tokens_map.json'), ModelFile(rfilename='tokenizer.json'), ModelFile(rfilename='tokenizer_config.json')]
config: None
id: Jiva/xlm-roberta-large-it-mnli
private: False
downloads: 11061
library_name: transformers
likes: 1
}
```

As you can see, it found the models that fit all the criteria. You can even take it further
by passing in an array for each of the parameters from before. For example, let's take a look
for the same configuration, but also include `TensorFlow` in the filter:


```python
>>> filt = ModelFilter(
... task=model_args.pipeline_tag.TextClassification,
... library=[model_args.library.PyTorch, model_args.library.TensorFlow]
... )
>>> api.list_models(filter=filt)[0]
ModelInfo: {
modelId: distilbert-base-uncased-finetuned-sst-2-english
sha: ada5cc01a40ea664f0a490d0b5f88c97ab460470
lastModified: 2022-03-22T19:47:08.000Z
tags: ['pytorch', 'tf', 'rust', 'distilbert', 'text-classification', 'en', 'dataset:sst-2', 'transformers', 'license:apache-2.0', 'infinity_compatible']
pipeline_tag: text-classification
siblings: [ModelFile(rfilename='.gitattributes'), ModelFile(rfilename='README.md'), ModelFile(rfilename='config.json'), ModelFile(rfilename='map.jpeg'), ModelFile(rfilename='pytorch_model.bin'), ModelFile(rfilename='rust_model.ot'), ModelFile(rfilename='tf_model.h5'), ModelFile(rfilename='tokenizer_config.json'), ModelFile(rfilename='vocab.txt')]
config: None
id: distilbert-base-uncased-finetuned-sst-2-english
private: False
downloads: 3917525
library_name: transformers
likes: 49
}
```

This query is strictly equivalent to:

```py
>>> filt = ModelFilter(
... task="text-classification",
... library=["pytorch", "tensorflow"],
... )
```

Here, the [`ModelSearchArguments`] has been a helper to explore the options available on the Hub.
However, it is not a requirement to make a search. Another way to do that is to visit the
[models](https://huggingface.co/models) and [datasets](https://huggingface.co/datasets) pages
To explore available filter on the Hub, visit [models](https://huggingface.co/models) and [datasets](https://huggingface.co/datasets) pages
in your browser, search for some parameters and look at the values in the URL.

7 changes: 0 additions & 7 deletions docs/source/en/package_reference/hf_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,10 +114,3 @@ Some helpers to filter repositories on the Hub are available in the `huggingface

[[autodoc]] ModelFilter

### DatasetSearchArguments

[[autodoc]] DatasetSearchArguments

### ModelSearchArguments

[[autodoc]] ModelSearchArguments
4 changes: 0 additions & 4 deletions src/huggingface_hub/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,12 +136,10 @@
"CommitOperationAdd",
"CommitOperationCopy",
"CommitOperationDelete",
"DatasetSearchArguments",
"GitCommitInfo",
"GitRefInfo",
"GitRefs",
"HfApi",
"ModelSearchArguments",
"RepoUrl",
"User",
"UserLikes",
Expand Down Expand Up @@ -457,12 +455,10 @@ def __dir__():
CommitOperationAdd, # noqa: F401
CommitOperationCopy, # noqa: F401
CommitOperationDelete, # noqa: F401
DatasetSearchArguments, # noqa: F401
GitCommitInfo, # noqa: F401
GitRefInfo, # noqa: F401
GitRefs, # noqa: F401
HfApi, # noqa: F401
ModelSearchArguments, # noqa: F401
RepoUrl, # noqa: F401
User, # noqa: F401
UserLikes, # noqa: F401
Expand Down
Loading

0 comments on commit 0a2503d

Please sign in to comment.