Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List available branches for a given model/dataset #1260

Closed
antoche opened this issue Dec 12, 2022 · 9 comments · Fixed by #1276
Closed

List available branches for a given model/dataset #1260

antoche opened this issue Dec 12, 2022 · 9 comments · Fixed by #1276

Comments

@antoche
Copy link

antoche commented Dec 12, 2022

Loading a given model sometimes requires passing a specific branch name (e.g., "fp16" to load a 16-bit-float model) to fetch. But there is no way I found in the documentation to list the available branches for a given model. The only way I've found to list branches is to browse the page for the model and look at the branch dropdown. It would be useful to know what branches exist on the repository in order to find one appropriate for the task at hand.

@Wauplin
Copy link
Contributor

Wauplin commented Dec 12, 2022

Hi @antoche , thanks for opening the issue. This is indeed a good use case we could have.

What would you say about such an API ?

>>> from huggingface_hub import HfApi
>>> HfApi.list_repo_refs("stanfordnlp/stanza-fr")
{"tags":["v1.3.0","v1.3.1","v1.4.0","v1.4.1","v1.5.0"],"branches":["main"],"converts":[]}

We still need to implement the feature server-side.

@antoche
Copy link
Author

antoche commented Dec 12, 2022

That looks great, thanks! (I don't think I've come across "converts" refs before?)

How would this interact with the cache and/or offline mode?

@Wauplin
Copy link
Contributor

Wauplin commented Dec 13, 2022

converts refs are a very specific type. Only used in for dataset repos and contain preprocessed data (data converted to parquet) => most users won't need it but still interesting to have it exposed.

How would this interact with the cache and/or offline mode?

For now this is implemented as an API call (from within HfApi). Offline mode/cache is more related to the "file download" stuff (e.g. hf_hub_download and snapshot_download). @antoche What would be your use case for an offline more on this API ? If the idea is to list the refs that have been cached locally, it would have to be done separately. The use case seems quite different to me.

@antoche
Copy link
Author

antoche commented Dec 13, 2022

My question about the offline mode was to check what would happen with that call in offline mode. Ideally I would prefer it to return the refs that have been cached in this case, rather than raise an exception, if that's achievable. In any case, this request is primarily concerned with the non-offline behavior.

@Wauplin
Copy link
Contributor

Wauplin commented Dec 13, 2022

In that case I think we will not handle the offline mode for that feature. None of the HfApi endpoints are designed to work offline. It is more of a "design" decision rather than an actual constraint.

The problem of implementing it "offline" is that it can be misleading + if we extend the feature in the future we might not be able to have the same output for both online and offline modes (let's say we want to return "createdAt" or "updatedAt" for each ref).

@Wauplin
Copy link
Contributor

Wauplin commented Dec 13, 2022

And actually the output type has been changed server-side. We will still return 3 lists for tags/branches/convert but for each item instead of a string we will return a dictionary with name ("v0.14") + reference ("refs/tags/v0.14") + target_commit (the commit id for this ref).

I will put it here once it's definitely settled.

@antoche
Copy link
Author

antoche commented Dec 13, 2022

Sounds good, looking forward to using it!

@Wauplin
Copy link
Contributor

Wauplin commented Dec 21, 2022

Hey @antoche 👋 PR is implemented and merged :)

Feel free to use it by installing from main or wait for next release.

>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> api.list_repo_refs("gpt2")
RepoRefs(branches=[RefInfo(name='main', ref='refs/heads/main', target_commit='e7da7f221d5bf496a48136c0cd264e630fe9fcc8')], converts=[], tags=[])

>>> api.list_repo_refs("bigcode/the-stack", repo_type='dataset')
RepoRefs(
    branches=[
        RefInfo(name='main', ref='refs/heads/main', target_commit='18edc1591d9ce72aa82f56c4431b3c969b210ae3'),
        RefInfo(name='v1.1.a1', ref='refs/heads/v1.1.a1', target_commit='f9826b862d1567f3822d3d25649b0d6d22ace714')
    ],
    converts=[],
    tags=[
        RefInfo(name='v1.0', ref='refs/tags/v1.0', target_commit='c37a8cd1e382064d8aced5e05543c5f7753834da')
    ]
)

@antoche
Copy link
Author

antoche commented Dec 21, 2022

That's awesome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants