Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp of Hub documentation #62

Closed
osanseviero opened this issue Mar 9, 2022 · 14 comments
Closed

Revamp of Hub documentation #62

osanseviero opened this issue Mar 9, 2022 · 14 comments
Assignees

Comments

@osanseviero
Copy link
Contributor

As we do huggingface/huggingface_hub#744 and add library documentation based in docstrings, the hub docs might split into hugginface_hub and product usage.

Based on existing content, what we can do is more self-contained use cases. An initial mental model without creating additional content would be

Move to huggingface_hub as guides

Then on the Hub

Model card

  • What are model cards and why are they useful?
  • When sharing a model, what should I add to my model card?
  • Model card metadata
  • How are model tags determined?
  • Can I specify which framework supports my model?
  • How can I link a model to a dataset?
  • Can I access models programatically?
  • Can I write LaTeX in my model card?
  • How is a model's type of inference API and widget determined?
  • What are all the possible task/widget types?

Repositories

  • What's a repository?
  • How can I explore the Hugging Face Hub?
  • How can I load/push from/to the Hub?
  • How can I rename or transfer a repo?
  • How can I fork or rebase a repository with LFS pointers?
  • List of license identifiers

CO2 Emissions

  • Why is it useful to calculate the carbon emissions of my model?
  • What information should I include about the carbon footprint of my model?
  • Carbon footprint metadata
  • How is the carbon footprint of my model calculated? 🌎

Widgets

  • What's a widget?
  • How can I control my model's widget example input?
  • How to create a new widget?

Inference API

  • What's the Inference API?
  • How can I control my model's widget Inference API parameters?

Security

Endpoints

Adding a new task

Integrating a new library (with parts of it linking to huggingface_hub)

This would also include Spaces, which can very likely break into more pieces

WDYT @julien-c @LysandreJik @adrinjalali @muellerzr of this as a first step once we kick off the splitting

@julien-c
Copy link
Member

julien-c commented Mar 9, 2022

Yay!

Maybe Repositories should be before Model cards (more general)

Maybe there's also Model card we can also maybe further divide into Repo cards and Model cards, Dataset cards, etc (to expose "repo cards" as a general concept)

@muellerzr
Copy link
Contributor

cc @sashavor so she's aware of the CO2 bits 😄

@LysandreJik
Copy link
Member

Great list, thanks for compiling @osanseviero! Opened a tracker for the library documentation based on docstrings here: huggingface/huggingface_hub#759.

@adrinjalali
Copy link
Contributor

Thanks @osanseviero , this makes sense. We can then expand on the user guides of the huggingface_library after the split.

@osanseviero
Copy link
Contributor Author

For Spaces, I think useful questions are

  • How to install OpenCV (this one is tricky)
  • How to build a Space with TensorFlow.js
  • How to build a Space with Flask
  • How to build a Space with ...

@adrinjalali
Copy link
Contributor

Something we were just talking with @LysandreJik is that at the moment we don't have documentation on specifications of each integration.

Trying to explain:

Many of our integrations have been happening with close collaboration with third party developers. We would sometimes change something in the backend to support certain aspects of the new integration. For instance, the fastai integration adds dependencies to pyproject.toml and then those dependencies are installed somewhere, somehow, before the model is loaded.

What's missing is the kind of documentation explaining what exactly is required from a fastai repo on our side, for it to be properly interpreted and the inference API to work on it. It would be nice for a PR such as huggingface/huggingface_hub#678 to add that kind of documentation before being merged. This would also allow others to implement their own integration, even in a different language than Python if they need to. It would also give clarity to users on how their repos should look like.

@osanseviero
Copy link
Contributor Author

Is it ok if I transfer this to hubs-docs repo? Most of this is related to Hub documentation, and the guides can be kept for huggingface_hub library 😄

@LysandreJik
Copy link
Member

Yes, feel free to!

@osanseviero osanseviero transferred this issue from huggingface/huggingface_hub Mar 17, 2022
@adrinjalali
Copy link
Contributor

Opened huggingface/huggingface_hub#777 to track the part related to huggingface_hub

@osanseviero
Copy link
Contributor Author

As discussed in https://huggingface.slack.com/archives/C01BWJU0YKW/p1647526780537909, we should also show how to retroactively migrate files from previous commits from git to git lfs

cc @lvwerra

@patrickvonplaten
Copy link
Contributor

RE:

Model card

What are model cards and why are they useful?
When sharing a model, what should I add to my model card?
Model card metadata
How are model tags determined?
Can I specify which framework supports my model?
How can I link a model to a dataset?
Can I access models programatically?
Can I write LaTeX in my model card?
How is a model's type of inference API and widget determined?
What are all the possible task/widget types?
Repositories

What's a repository?
How can I explore the Hugging Face Hub?
How can I load/push from/to the Hub?
How can I rename or transfer a repo?
How can I fork or rebase a repository with LFS pointers?
List of license identifiers

I'd advocate quite strongly to have a higher level split between "Model repository", "Dataset repository", "Space repository". Structuring the documentation about "What's a repository?" is too bottom-up, from the engineering point of view. We structure hf.co quite strongly around "Models", "Datasets", "Spaces", so I think the better approach here would also be to have repo-type specific docs. IMO, privacy, storage is more important for datasets, inference API, TF traces is more important for model repos, CPU/GPU RAM is more important for spaces. Also we can have a nicer link to the respective libraries this way, i.e. Transformers and Datasets.

Similarly, I would make "Model card" a subsection of "Model repository" and also have a section on Dataset cards.

@NielsRogge
Copy link
Contributor

NielsRogge commented Apr 20, 2022

As said before, instead of writing titles such as:

What are model cards and why are they useful?
When sharing a model, what should I add to my model card?
Model card metadata
How are model tags determined?
Can I specify which framework supports my model?
How can I link a model to a dataset?
Can I access models programatically?
Can I write LaTeX in my model card?
How is a model's type of inference API and widget determined?
What are all the possible task/widget types?

It's much clearer to write:

Model cards
Sharing a model
Metadata
Model tags
Linking a model
Add LaTeX

etc.

@osanseviero
Copy link
Contributor Author

Copy paste from internal Slack discussion with @hollance

Ah nice. Since I'm kind of new to the hub myself, I'm looking at it with "beginner's eyes" and a lot of these things aren't obvious. 
(For example, I tried to make a custom dataset and it was not clear at all how to do this and I had to do a lot of digging to figure it out.)
Even if the functionality to do certain things exists, like hosting custom models, if this info is hidden then it will be hard to convince people to use HF.

@NimaBoscarino
Copy link
Contributor

As said before, instead of writing titles such as:...

I'm actually kind of curious, does anyone know if there's an SEO benefit to one of those styles over the other?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants