Add `visual-question-answering` docs #382

jlondonobo · 2022-10-06T22:40:37Z

Summary

Added about.md and data.ts to the visual-question-answering documentation. This PR partially fixes #362.

Questions

I wrote the Use Cases section as "this model can be used to..." instead of "this model is used to..." since use cases for Visual Question Answering are still somewhat experimental. Would you like to use the later as most of the other tasks do?

merveenoyan

Thanks a lot for working on this, left minor comments 🙂 overall it's very good!
(also got it on my local, it looks nice! 🙂)

merveenoyan · 2022-10-07T14:27:01Z

tasks/src/visual-question-answering/data.ts

+	],
+	models:        [
+		{
+			description: "Vision-and-Language Transformer (ViLT) model fine-tuned on VQAv2",


Can you simplify this explanation? We try to put ourselves in the shoes of a person that doesn't know machine learning.
Something along the lines of "Robust model trained on visual question answering task."

You are totally right!

merveenoyan · 2022-10-07T14:40:48Z

tasks/src/visual-question-answering/data.ts

+						score: 0.009,
+					},
+					{
+						label: "1",


The probabilities don't add up to one? 😅 also maybe for simplicity we could keep it to yes and no only, what do you think? 🙂

The probabilities don't have to add up to 1 since they are not mutually exclusive for more complex questions, such as "what is on top of ...".

Here's the output of the specific model:

>>> image = Image.open("assets/elephant.jpeg") >>> question = "Is there an elephant?" >>> vqa_pipeline(image, question, top_k=3) [{'score': 0.9998154044151306, 'answer': 'yes'}, {'score': 0.009802112355828285, 'answer': 'no'}, {'score': 0.0020384755916893482, 'answer': '1'}]

Do you think it would make it more clear if we increase the number of outputs top_k to make the sum go way over 1 so this is not confusing?

I thought about removing the "1" label too, but decided to keep it because it kind of showed that VQA's outputs are not just binary. Maybe we could show this better by asking a different question, such as "where is the elephant?".

@jlondonob sure, you can also include a better input not asking for a yes/no question but something more like "What's in this image?" Thanks a lot for the explanation!

tasks/src/visual-question-answering/data.ts

merveenoyan · 2022-10-07T14:42:11Z

tasks/src/visual-question-answering/data.ts

+	metrics:       [
+		{
+			description: "For open-ended and multiple-choice VQA.",
+			id:          "Accuracy",


you can simply write accuracy and leave it like that, as they get retrieved from https://huggingface.co/metrics if the metric exists there.

Understood!

tasks/src/visual-question-answering/data.ts

tasks/src/visual-question-answering/about.md

Improved redaction of use cases Co-authored-by: Merve Noyan <merveenoyan@gmail.com>

merveenoyan

I only left two comments, will be happy to merge once we discuss & address it 🤗 thanks a lot for the great work!

merveenoyan · 2022-10-10T15:23:32Z

tasks/src/visual-question-answering/about.md

+## Use Cases
+
+### Automatic Medical Diagnostic
+Visual Question Answering (VQA) models can be used to help clinical staff diagnose conditions. For example, a doctor could input an MRI scan to the model and ask "what does this MRI scan show?" and get the answer to their question. 


We discussed with @meg-huggingface about this (given she studied ethical side of VQA task) it's better if we don't include this (the model can pick up spurious cues and might answer questions wrongly in life critical situations)

merveenoyan · 2022-10-10T15:24:03Z

tasks/src/visual-question-answering/about.md

+VQA models can be used to reduce visual barriers for visually impaired individuals by allowing them to get information about images from the web and the real world.
+
+### Unattended Surveillance
+Video Question Answering can be used to quickly extract valuable information from surveillance recordings. 


For this one, I think a basic activity detection model would be enough and VQA is too overkill.

tasks/src/visual-question-answering/about.md

merveenoyan · 2022-10-19T15:42:08Z

@jlondonob do you need help on this? 🤗

jlondonobo · 2022-10-19T17:43:57Z

@merveenoyan sorry for the delay, I'm on vacation and forgot to complete these changes. I'll get them done today.

merveenoyan · 2022-10-19T17:45:30Z

@jlondonob ah didn't want to make you feel in a rush or anything, sometimes contributors want help but are too shy to ask for it so I thought I'd ask you if you need one 😅

jlondonobo · 2022-10-19T19:36:06Z

@merveenoyan Thanks for your constant help 🚀! I removed the Automatic Medical Diagnostic and Unattended Surveillance sections, as well as included the placeholder for Useful Resources. I also updated the example to use the question you suggested.

Do you think the PR is ready to be merged?

merveenoyan

Let's get this merged!

tasks/src/visual-question-answering/data.ts

merveenoyan · 2022-10-19T20:47:54Z

@jlondonob what is your HF user name? I will add your name at the end to give you credits before we deploy it 🤗 congratulations on the merger!!! ✨

jlondonobo · 2022-10-30T09:59:14Z

@merveenoyan thanks a lot! My HF username is jlondonobo

jlondonobo added 2 commits October 6, 2022 16:51

🌟 Markdown

2e502b2

🌟 Add data

3b16ad1

osanseviero requested a review from merveenoyan October 7, 2022 08:45

merveenoyan reviewed Oct 7, 2022

View reviewed changes

jlondonobo and others added 5 commits October 7, 2022 11:49

🧹 Apply direct suggestions from code review

ac04d0b

Improved redaction of use cases Co-authored-by: Merve Noyan <merveenoyan@gmail.com>

🌟 Add explanation for PIL requirement

9266ec6

🌟 Apply suggestions to data.ts

a24c908

♻️ Unify style and formatting

a937669

🌟 Add vqa_v2 dataset

e95b260

jlondonobo requested a review from merveenoyan October 9, 2022 04:19

jlondonobo mentioned this pull request Oct 9, 2022

Visual-question-answering Task Page #362

Closed

11 tasks

merveenoyan approved these changes Oct 10, 2022

View reviewed changes

jlondonobo added 2 commits October 19, 2022 20:37

📝 Apply text suggestions

5c66937

♻️ Update example widget

b9bcf19

merveenoyan approved these changes Oct 19, 2022

View reviewed changes

tasks/src/visual-question-answering/data.ts Outdated Show resolved Hide resolved

Update tasks/src/visual-question-answering/data.ts

0beeefc

merveenoyan merged commit 63830c1 into huggingface:main Oct 19, 2022

Add visual-question-answering docs #382

Add visual-question-answering docs #382

Uh oh!

Conversation

jlondonobo commented Oct 6, 2022 • edited by merveenoyan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Questions

Uh oh!

merveenoyan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlondonobo Oct 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

merveenoyan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

merveenoyan commented Oct 19, 2022

Uh oh!

jlondonobo commented Oct 19, 2022

Uh oh!

merveenoyan commented Oct 19, 2022

Uh oh!

jlondonobo commented Oct 19, 2022

Uh oh!

merveenoyan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

merveenoyan commented Oct 19, 2022

Uh oh!

jlondonobo commented Oct 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add `visual-question-answering` docs #382

Add `visual-question-answering` docs #382

jlondonobo commented Oct 6, 2022 •

edited by merveenoyan

Loading

jlondonobo Oct 7, 2022 •

edited

Loading