Split docs on modality #18205

stevhliu · 2022-07-19T22:50:58Z

Currently, audio and computer vision lack content or the existing content is mixed with NLP. This PR splits the toctree on modality to make it easier to discover content for audio/computer vision while also allowing us to also scale to any additional modalities we want to support. As we create additional content, these new sections make it easier to collect specific content in one place. For example, the upcoming generate docs can be placed in the NLP section.

This structure can also help us identify gaps in the docs between each modality to ensure documentation is complete. For example, NLP has a page about tokenizers, and we can create a similar page for the other modalities using feature extractors and processors.

Other sections include:

General usage for modality-neutral content.
Performance and scalability for content related to large models.
Contribute for how to test, open a PR, and add models/pipelines.

After we split the docs, the next step would be to start planning and creating additional content to make the audio/computer vision sections more complete.

Looking forward to hearing what you think :)

HuggingFaceDocBuilderDev · 2022-07-19T23:13:56Z

The documentation is not available anymore as the PR was closed or merged.

sgugger · 2022-07-20T07:42:09Z

Not really convinced by this as it's now very unclear that all the task tutorials are task-specific tutorials. I liked it better when they were grouped altogether under task. It also backfires from the intent as this shows we don't really support vision (one entry) or speech (two entries) compared to NLP (9 entries).

This revamp also puts advanced guides on the top when they should be way lower in the table of contents (such as Benchmarks or Migrating from other packages).

stevhliu · 2022-07-21T22:39:25Z

Thanks for the feedback! 🤗

Not really convinced by this as it's now very unclear that all the task tutorials are task-specific tutorials.

Hmm, do you mean it’s more unclear now because each task tutorial is separated by modality? This seems clearer to me since the section headers are more scannable.

It also backfires from the intent as this shows we don't really support vision (one entry) or speech (two entries) compared to NLP (9 entries).

Good perspective, and I totally see what you mean! I think another way to look at it is by creating these new sections, we’re signaling that we plan to create more content for audio/CV. This gives these sections more prominence, which shows we want to focus on audio/CV. So even though it looks pretty bare right now, I think that’s ok since these sections will grow.

This revamp also puts advanced guides on the top when they should be way lower in the table of contents (such as Benchmarks or Migrating from other packages).

I don’t think we should put guides lower because they are more advanced. Instead, it may be better to prioritize guides that users are more likely to find useful. For example, I think the Migration/Train with a script guides are pretty useful. This may be a symptom of how I grouped all these guides under General Usage, in which case, we can try breaking up the section and reordering these guides by their utility.

sgugger · 2022-07-22T06:25:12Z

Hmm, do you mean it’s more unclear now because each task tutorial is separated by modality?

They were all under a "Task" section, which is not the case anymore in your proposal. In NLP, you go from fast tokenizers and multilingual to a task-specific tutorial with no warning to the user.

I don’t think we should put guides lower because they are more advanced. Instead, it may be better to prioritize guides that users are more likely to find useful.

Benchmarks or migrating are both advanced and not useful (benchmarks have 0 issues and we are even questioning whether they should stay in the library and pytorch-pretrained-bert ceased to exist a while ago). You should check the analytics to be certain, but I'm pretty sure they are very far from the most-visited pages and they are definitely very low on the list of pages we want to nudge the users on.

stevhliu · 2022-07-26T17:35:26Z

Ok I see now! You're worried users won't know the task-specific guides are guides about fine-tuning a model for a task if it is just thrown into the NLP section. I think there are some things we can do to help make this clearer to users (in order of preference):

Include an overview page for each modality section explaining what users can expect to find.
Update the task-specific guides to have clearer titles like, How to fine-tune for text classification.
Create another nested section in each modality that focuses on the task-specific guides.

Benchmarks or migrating are both advanced and not useful (benchmarks have 0 issues and we are even questioning whether they should stay in the library and pytorch-pretrained-bert ceased to exist a while ago).

For sure! Benchmark, Migration, and Troubleshoot are bottom-3 in page views in the General Usage section. I can bump these out and move them closer to the bottom.

sgugger · 2022-07-26T19:28:35Z

Option 2 or 3 are good compromise (my preference goes to 3 if nested-ness is not an issue). I'd leave Troubleshoot in the General Usage section (hopefully we can make it better so it gets more views), but yeah, the other two are out of place there IMO.

Let's see what other people think as well, @LysandreJik @patrickvonplaten to name a few :-)

LysandreJik · 2022-07-27T08:15:10Z

I also think that option 3) sounds like the best approach. I don't have a problem with adding a nesting level.

stevhliu · 2022-07-28T22:55:42Z

I nested the NLP section but it looks a little off since the content inside isn't aligned on the same level (I pinged @mishig25 on this). I didn't add a nested level for the audio and image sections since there's no content in those sections yet, and it might look a little strange.

stevhliu · 2022-08-17T22:10:41Z

Hi team, just wanted to circle back on this and see if there are any more comments or feedback about how the docs are split. Otherwise, I think we're ready to merge! 🙂

patrickvonplaten · 2022-08-31T12:24:52Z

Option 3.) Also looks like the right one to me :-)

However I'm not a big fan of "Image" as a title. Could we maybe try to align those sections a bit with how we call the modalities on the Hub: https://huggingface.co/tasks -> so maybe replace "Image" with "Computer Vision"?

Wdyt @sgugger @LysandreJik @osanseviero

docs/source/en/_toctree.yml

sgugger

LGTM barring the quotes to remove everywhere. Thanks for iterating on this PR!

docs/source/en/_toctree.yml

* update * 🖍 add missing files * 📝 add nested sections * 🖍 align titles with tasks * oops * remove quotes from titles

patrickvonplaten reviewed Aug 31, 2022

View reviewed changes

docs/source/en/_toctree.yml Outdated Show resolved Hide resolved

patrickvonplaten reviewed Aug 31, 2022

View reviewed changes

docs/source/en/_toctree.yml Outdated Show resolved Hide resolved

stevhliu added 5 commits September 1, 2022 10:19

update

37ba319

🖍 add missing files

49dc232

📝 add nested sections

db9c5c2

🖍 align titles with tasks

752a5ea

oops

a6f45f5

stevhliu force-pushed the split-docs-on-modality branch from 83205b1 to a6f45f5 Compare September 1, 2022 17:37

stevhliu marked this pull request as ready for review September 1, 2022 18:10

stevhliu changed the title ~~[WIP] Split docs on modality~~ Split docs on modality Sep 1, 2022

stevhliu requested a review from sgugger September 1, 2022 18:14

sgugger approved these changes Sep 1, 2022

View reviewed changes

remove quotes from titles

1f77bb8

stevhliu merged commit 142e12a into huggingface:main Sep 1, 2022

stevhliu deleted the split-docs-on-modality branch September 1, 2022 20:19

oneraghavan pushed a commit to oneraghavan/transformers that referenced this pull request Sep 26, 2022

Split docs on modality (huggingface#18205)

96f986b

* update * 🖍 add missing files * 📝 add nested sections * 🖍 align titles with tasks * oops * remove quotes from titles

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split docs on modality #18205

Split docs on modality #18205

stevhliu commented Jul 19, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 19, 2022 •

edited

Loading

sgugger commented Jul 20, 2022

stevhliu commented Jul 21, 2022

sgugger commented Jul 22, 2022

stevhliu commented Jul 26, 2022

sgugger commented Jul 26, 2022

LysandreJik commented Jul 27, 2022

stevhliu commented Jul 28, 2022

stevhliu commented Aug 17, 2022

patrickvonplaten commented Aug 31, 2022

sgugger left a comment

Split docs on modality #18205

Split docs on modality #18205

Conversation

stevhliu commented Jul 19, 2022 • edited Loading

HuggingFaceDocBuilderDev commented Jul 19, 2022 • edited Loading

sgugger commented Jul 20, 2022

stevhliu commented Jul 21, 2022

sgugger commented Jul 22, 2022

stevhliu commented Jul 26, 2022

sgugger commented Jul 26, 2022

LysandreJik commented Jul 27, 2022

stevhliu commented Jul 28, 2022

stevhliu commented Aug 17, 2022

patrickvonplaten commented Aug 31, 2022

sgugger left a comment

Choose a reason for hiding this comment

stevhliu commented Jul 19, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 19, 2022 •

edited

Loading