Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split docs on modality #18205

Merged
merged 6 commits into from
Sep 1, 2022
Merged

Conversation

stevhliu
Copy link
Member

@stevhliu stevhliu commented Jul 19, 2022

Currently, audio and computer vision lack content or the existing content is mixed with NLP. This PR splits the toctree on modality to make it easier to discover content for audio/computer vision while also allowing us to also scale to any additional modalities we want to support. As we create additional content, these new sections make it easier to collect specific content in one place. For example, the upcoming generate docs can be placed in the NLP section.

This structure can also help us identify gaps in the docs between each modality to ensure documentation is complete. For example, NLP has a page about tokenizers, and we can create a similar page for the other modalities using feature extractors and processors.

Other sections include:

  • General usage for modality-neutral content.
  • Performance and scalability for content related to large models.
  • Contribute for how to test, open a PR, and add models/pipelines.

After we split the docs, the next step would be to start planning and creating additional content to make the audio/computer vision sections more complete.

Looking forward to hearing what you think :)

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 19, 2022

The documentation is not available anymore as the PR was closed or merged.

@sgugger
Copy link
Collaborator

sgugger commented Jul 20, 2022

Not really convinced by this as it's now very unclear that all the task tutorials are task-specific tutorials. I liked it better when they were grouped altogether under task. It also backfires from the intent as this shows we don't really support vision (one entry) or speech (two entries) compared to NLP (9 entries).

This revamp also puts advanced guides on the top when they should be way lower in the table of contents (such as Benchmarks or Migrating from other packages).

@stevhliu
Copy link
Member Author

Thanks for the feedback! 🤗

Not really convinced by this as it's now very unclear that all the task tutorials are task-specific tutorials.

Hmm, do you mean it’s more unclear now because each task tutorial is separated by modality? This seems clearer to me since the section headers are more scannable.

It also backfires from the intent as this shows we don't really support vision (one entry) or speech (two entries) compared to NLP (9 entries).

Good perspective, and I totally see what you mean! I think another way to look at it is by creating these new sections, we’re signaling that we plan to create more content for audio/CV. This gives these sections more prominence, which shows we want to focus on audio/CV. So even though it looks pretty bare right now, I think that’s ok since these sections will grow.

This revamp also puts advanced guides on the top when they should be way lower in the table of contents (such as Benchmarks or Migrating from other packages).

I don’t think we should put guides lower because they are more advanced. Instead, it may be better to prioritize guides that users are more likely to find useful. For example, I think the Migration/Train with a script guides are pretty useful. This may be a symptom of how I grouped all these guides under General Usage, in which case, we can try breaking up the section and reordering these guides by their utility.

@sgugger
Copy link
Collaborator

sgugger commented Jul 22, 2022

Hmm, do you mean it’s more unclear now because each task tutorial is separated by modality?

They were all under a "Task" section, which is not the case anymore in your proposal. In NLP, you go from fast tokenizers and multilingual to a task-specific tutorial with no warning to the user.

I don’t think we should put guides lower because they are more advanced. Instead, it may be better to prioritize guides that users are more likely to find useful.

Benchmarks or migrating are both advanced and not useful (benchmarks have 0 issues and we are even questioning whether they should stay in the library and pytorch-pretrained-bert ceased to exist a while ago). You should check the analytics to be certain, but I'm pretty sure they are very far from the most-visited pages and they are definitely very low on the list of pages we want to nudge the users on.

@stevhliu
Copy link
Member Author

Ok I see now! You're worried users won't know the task-specific guides are guides about fine-tuning a model for a task if it is just thrown into the NLP section. I think there are some things we can do to help make this clearer to users (in order of preference):

  1. Include an overview page for each modality section explaining what users can expect to find.
  2. Update the task-specific guides to have clearer titles like, How to fine-tune for text classification.
  3. Create another nested section in each modality that focuses on the task-specific guides.

Benchmarks or migrating are both advanced and not useful (benchmarks have 0 issues and we are even questioning whether they should stay in the library and pytorch-pretrained-bert ceased to exist a while ago).

For sure! Benchmark, Migration, and Troubleshoot are bottom-3 in page views in the General Usage section. I can bump these out and move them closer to the bottom.

@sgugger
Copy link
Collaborator

sgugger commented Jul 26, 2022

Option 2 or 3 are good compromise (my preference goes to 3 if nested-ness is not an issue). I'd leave Troubleshoot in the General Usage section (hopefully we can make it better so it gets more views), but yeah, the other two are out of place there IMO.

Let's see what other people think as well, @LysandreJik @patrickvonplaten to name a few :-)

@LysandreJik
Copy link
Member

I also think that option 3) sounds like the best approach. I don't have a problem with adding a nesting level.

@stevhliu
Copy link
Member Author

I nested the NLP section but it looks a little off since the content inside isn't aligned on the same level (I pinged @mishig25 on this). I didn't add a nested level for the audio and image sections since there's no content in those sections yet, and it might look a little strange.

@stevhliu
Copy link
Member Author

Hi team, just wanted to circle back on this and see if there are any more comments or feedback about how the docs are split. Otherwise, I think we're ready to merge! 🙂

@patrickvonplaten
Copy link
Contributor

Option 3.) Also looks like the right one to me :-)

However I'm not a big fan of "Image" as a title. Could we maybe try to align those sections a bit with how we call the modalities on the Hub: https://huggingface.co/tasks -> so maybe replace "Image" with "Computer Vision"?

Wdyt @sgugger @LysandreJik @osanseviero

@stevhliu stevhliu force-pushed the split-docs-on-modality branch from 83205b1 to a6f45f5 Compare September 1, 2022 17:37
@stevhliu stevhliu marked this pull request as ready for review September 1, 2022 18:10
@stevhliu stevhliu changed the title [WIP] Split docs on modality Split docs on modality Sep 1, 2022
@stevhliu stevhliu requested a review from sgugger September 1, 2022 18:14
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM barring the quotes to remove everywhere. Thanks for iterating on this PR!

docs/source/en/_toctree.yml Outdated Show resolved Hide resolved
docs/source/en/_toctree.yml Outdated Show resolved Hide resolved
docs/source/en/_toctree.yml Outdated Show resolved Hide resolved
docs/source/en/_toctree.yml Outdated Show resolved Hide resolved
docs/source/en/_toctree.yml Outdated Show resolved Hide resolved
docs/source/en/_toctree.yml Outdated Show resolved Hide resolved
docs/source/en/_toctree.yml Outdated Show resolved Hide resolved
@stevhliu stevhliu merged commit 142e12a into huggingface:main Sep 1, 2022
@stevhliu stevhliu deleted the split-docs-on-modality branch September 1, 2022 20:19
oneraghavan pushed a commit to oneraghavan/transformers that referenced this pull request Sep 26, 2022
* update

* 🖍 add missing files

* 📝 add nested sections

* 🖍 align titles with tasks

* oops

* remove quotes from titles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants