Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add use cases #2

Merged

Conversation

tomoyukilabs
Copy link
Contributor

@tomoyukilabs tomoyukilabs commented Nov 8, 2018

This PR adds use cases to the WebNN draft. The use cases I've added are copied from
webmachinelearning/meetings#1 (comment) as a starting point. We will capture suggestions and contributions from F2F discussion during review of this PR before merging it.


Preview | Diff

@anssiko
Copy link
Member

anssiko commented Nov 8, 2018

@tomoyukilabs, thank you!

All - please review the use cases and note this PR is just a starting point and will change based on your review comments. Please suggest new use cases, clarify existing ones, rewording, de-scoping some etc.

As a guideline for reviewers, use cases are generally more impactful if implementation feasibility can be demonstrated via a proof-of-concept, mapping to platform APIs as a "reality check", or similar.

Our charter sets the following expectation we should reflect the use cases against:

The APIs in scope of this group will not be tied to any particular platform and will be implementable on top of existing major platform APIs, such as Android Neural Networks API, Windows DirectML, and macOS/iOS Metal Performance Shaders and Basic Neural Network Subroutines.

Reviews from folks with close familiarity with one or more of the said platform APIs very welcome.

We also have @huningxin's API native mapping table at our disposal.

Copy link
Member

@anssiko anssiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomoyukilabs, I submitted some review comments for the group to consider. Feedback welcome!

Overall this is a great start, and I think the division into application-level and low-level use cases seems reasonable. I'd like to hear more feedback from the group on e.g. whether we miss some major use cases, or whether some of these use cases would be impractical to implement across major platforms we're committed to support in the charter.

I think the general suggestion of mine was to consider abstracting out API specifics from the application-level use cases, and to clarify terminology around "WebML API" vs. "WebNN API" in the low-level use cases.

I think @huningxin is working on an explainer document based on the material shared at F2F that will clarify the positioning of WebNN API (in scope of the CG) and the envisioned "WebML API" (currently out of scope of the CG) among other things, so we can have more concrete discussion around that topic. We might want to consider porting over some of the explainer content into the spec in the future, but I wouldn't block this PR on that.

index.bs Outdated

This section illustrates application-level use cases for the Web Machine
Learning API (WebML API). All applications in those use cases can be built on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could abstract out the API specifics from the use cases section and here say, for example:

This section illustrates application-level use cases for neural network inference hardware acceleration.
All applications in those use cases can be built on top of pre-trained deep neural network (DNN) models.

Alternatively, we could replace all occurrences of "WebML API" with "WebNN API", but abstracting out the API entirely in use cases discussion seems preferable to me.

Cc @huningxin for comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former example looks good to me. Thanks!

index.bs Outdated
### Person Detection ### {#usecase-person-detection}

A user is browsing a social media site and wishes to take a photo and upload it
to the site. Before the photo is uploaded, the site runs [[SSD]] or [[YOLO]] on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To abstract out the API specifics, I'd suggest something like:

Before the photo is uploaded, the site does object detection (for example, using object detection approaches such as [[SSD]] or [[YOLO]] that use a single deep neural network) to detect regions that include persons so that the user can filter and de-personalize irrelevant persons on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

index.bs Outdated

A user opens a web application that continuously captures her body with her
smartphone's camera. The web application extracts her skeleton by running
[[PoseNet]] on the WebML API to recognize her gesture or body language. When she
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The web application extracts her skeleton by running a machine learning model which allows for real-time human pose estimation such as [[PoseNet]] to recognize her gesture and body language.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

index.bs Outdated
the WebML API to detect regions that include persons so that the user can filter
and de-personalize irrelevant persons on it.

### Skeleton Detecton ### {#usecase-skeleton-detection}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Detecton/Detection/

index.bs Outdated
A user wishes to make her new account and looks for a new icon image. When she
clicks a "Generate" button on the webpage for creating an account, the webpage
runs a generator model of generative adversarial network (GAN) for icon
synthesis [[LogoSynthesis]] on the WebML API. She can repeat random icon
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose we remove " on the WebML API".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

index.bs Outdated

## Low-Level Use Cases ## {#usecases-lowlevel}

This section collects API-level use cases for the WebML API. It is supposed that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this "Low-Level Use Cases" section it'd be appropriate to talk about the API, since one expected API consumer is a framework or library and the use cases may need to suggest a certain API shape and feature to be meaningful.

Here's a proposed rewording:

This section collects API-level use cases for a dedicated low-level API for neural network inference hardware acceleration. It is expected that Machine Learning frameworks will be key consumers of the Web Neural Network API (WebNN API) and the low-level details exposed through the WebNN API are abstracted out from typical web developers. However, it is also expected that web developers with specific interest and competence in Machine Learning will want to interface with the WebNN API directly instead of a higher-level ML framework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your opinion, and the proposed explanation looks good to me. Thanks!

index.bs Outdated

### Custom Layer ### {#usecase-custom-layer}

A web application developer wants to run a DNN model on the WebML. However, she
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the WebNN API.

index.bs Outdated

A web application developer wants to run a DNN model on the WebML. However, she
has found that some of activation functions like [[LeakyReLU]], [[ELU]], etc. are
not included in the WebML API. So she constructs custom layers of the additional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebNN API

index.bs Outdated
A web application developer wants to run a DNN model on the WebML. However, she
has found that some of activation functions like [[LeakyReLU]], [[ELU]], etc. are
not included in the WebML API. So she constructs custom layers of the additional
activation functions on top of the WebML API. Note that the scope of custom
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebNN API

index.bs Outdated

A web application developer has a concern about performance of her DNN model on
mobile devices. She has confirmed that the model runs too slow on mobile devices
which does not have GPU acceleration. So her web application refers to the WebML
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebNN API

@tomoyukilabs
Copy link
Contributor Author

@anssiko Many thanks for your reviewing. I have followed your comments and updated this draft. PTAL.

@tomoyukilabs
Copy link
Contributor Author

Currently, all application-level use cases in this PR are about CNN-based image processing. I guess that we can find any other use cases, e.g. audio, text, sensor data, etc.

@anssiko
Copy link
Member

anssiko commented Nov 14, 2018

@tomoyukilabs, thanks for incorporating the suggestions, LGTM.

Before we consider merging, I'd like to get additional 2-3 reviews from the group, and optimally contributions for 1-2 application-level use cases that do not involve image processing.

@gregwhitworth
Copy link

@tomoyukilabs Thank you so much for taking the time to submit this PR. I agree that a text based use case would be valuable.

@huningxin
Copy link
Contributor

@tomoyukilabs , thanks much for putting together this PR!

During the TPAC F2F meeting, folks were also interested in the background removal/replacement for video conference. So it would be good to add the scene segmentation use case, for example [Deeplab V3+] or [Mask R-CNN].

Other vision based use cases could include super resolution e.g. [SRGAN], style transfer e.g. [Fast Style Transfer], face analysis e.g. [DeepFace] and face recognition e.g. [FaceNet]. Basically the are based on Convolutional Neural Networks (CNN).

Some text based use cases could be machine translation e.g. [GNMT] or [OpenNMT], sentiment analysis e.g. [DeepMoji], speech recognition e.g. [Deep Speech], text to speech e.g. [Deep Voice], image captioning e.g. [im2txt] and video summarization e.g. [Video-Summarization-with-LSTM]. They are usually based on Recurrent Neural Networks (RNN).

custom layers may include convolution, normalization, etc. as well as
activation.

### Network Concatenation ### {#usecase-network-concat}
Copy link
Contributor

@huningxin huningxin Nov 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the TPAC F2F meeting, this looks like a training use case. As training is out of current charter's scope, would it be better to add this in the future?

Copy link
Contributor Author

@tomoyukilabs tomoyukilabs Nov 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but not limited to training. Possible detailed examples are:

  • The web app downloads convolutional layer weights of MobileNetV1/V2 from CDN and weights of fully-connected layers made by transfer learning from her own web site
  • The web app downloads complete weights of MobileNetV1/V2, and then partially update fully-connected layers later by downloading fine-tuned weights

Anyway, the current description seems to suggest the use case of training, as you pointed out. I'll update those sentences so that they clearly indicate a use case of client-side partial update based on fine tuning or transfer learning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! It would be great if you can update the description accordingly.

@tomoyukilabs
Copy link
Contributor Author

@huningxin All use cases you have suggested including text-based ones look great to me. Many thanks!

@anssiko Is it okay to add those use cases to this PR?

@anssiko
Copy link
Member

anssiko commented Nov 21, 2018

@tomoyukilabs, yes please add. Similarly to the initial list of use cases, the group is expected to review any proposed additions and doing that is easier using the PR review facilities.

@anssiko
Copy link
Member

anssiko commented Nov 21, 2018

@huningxin, thanks for the great contribution!

Since accessibility is a key to the W3C's mission, maybe it's worth noting the accessibility benefit in connection with the image captioning use case [im2txt]. Being able to add image descriptions automatically greatly improves web accessibility. As we know, only a small fraction of images on the Web have been properly annotated.

@tomoyukilabs
Copy link
Contributor Author

Due to my business trip until the end of November, I'll start updating this PR as soon as possible after coming back to Tokyo. Thanks for your patience.

@tomoyukilabs
Copy link
Contributor Author

I've updated this PR. PTAL.

  • The use cases proposed by @huningxin in Add use cases #2 (comment) are added
  • Regarding the use case of GAN, image generation is replaced with super resolution, i.e. SRGAN
  • Person detection and skeleton detection are revised as the use cases of video conferencing
  • According to Add use cases #2 (review), model concatenation is modified as the use case of fine tuning.

Copy link
Member

@anssiko anssiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tomoyukilabs and @huningxin! I submitted some minor review comments.

index.bs Outdated
generation until she finds her favorite one.
A web-based video conferencing application records received video streams, and
it needs to reduce recorded video data to be stored. The application generates
the short version of the recoreded video by using a machine learning model for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/recoreded/recorded/

index.bs Outdated

A web application developer has a concern about performance of her DNN model on
mobile devices. She has confirmed that the model runs too slow on mobile devices
which does not have GPU acceleration. So her web application refers to the WebNN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/does/do/
s/So/To address this issue,/

index.bs Outdated

A web application developer wants to run a DNN model on the WebNN API. However,
she has found that some of activation functions like [[LeakyReLU]], [[ELU]],
etc. are not included in the WebNN API. So she constructs custom layers of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/So/To address this issue,/

index.bs Outdated
### Super Resolution ### {#usecase-super-resolution}

A web-based video conferencing is receiving a video stream from its peer, but
the resolution of the video becomes lower due to network congestion. So the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/So/To prevent degradation of the perceived video quality,/

index.bs Outdated

A user joins a teleconference via a web-based video conferencing application
from her room. However, she does not wish that her room is visible on the
screen. So the application runs a machine learning model such as [[DeepLabv3+]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/So/To protect the privacy of the other people and the surroundings,/

index.bs Outdated
### Semantic Segmentation ### {#usecase-segmentation}

A user joins a teleconference via a web-based video conferencing application
from her room. However, she does not wish that her room is visible on the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/room is/room and people in the background are/

index.bs Outdated
mobile devices. She has confirmed that the model runs too slow on mobile devices
which does not have GPU acceleration. So her web application refers to the WebNN
API to confirm whether acceleration is available or not, so that the application
can display the warning for devices without acceleration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/the/a/

index.bs Outdated
can display the warning for devices without acceleration.

After several weeks, she has developed a tiny DNN model that can even run on
CPU. So she modifies the application so that the application loads the tiny
Copy link
Member

@anssiko anssiko Dec 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/So/In order to accommodate for that,/

@tomoyukilabs
Copy link
Contributor Author

@anssiko Thanks for your review. I've revised this PR. PTAL.

@anssiko
Copy link
Member

anssiko commented Dec 11, 2018

@tomoyukilabs, thanks, LGTM.

All - We'll review this PR during our 13 December 2018 teleconference to assess consensus whether this set of use cases represents a good starting point for the initial API design.

See also the HTML preview of these use cases. All feedback welcome.

@zhiqiangyu
Copy link

These use cases look good. I like to propose one more case as below, mainly for web shopping scenario, could you please take a look? Any comment is welcome. Thanks.

Facial Features Detection:
A web-base shopping application detects user's facial features (e.g. the detailed information of eyes/nose/mouth/lips/etc), and enable user to perform beautify try-on simulations,such as wear the glasses, perform lipstick make-up, etc. An example could be found here: http://modiface.com/. Further more, this kind of capability can be also extended to more scenarios like human face modelling, emotion analysis, etc.

@anssiko
Copy link
Member

anssiko commented Dec 17, 2018

@zhiqiangyu, thanks! Facial features detection is indeed an important step in various facial analysis tasks, out of which we currently list face recognition and emotion analysis (could also be used for drowsiness detection in the person detection use case! 😴)

@tomoyukilabs @huningxin, how would you suggest we integrate facial features detection given it is a key step in many facial analysis tasks? Also, would it help to mention some commonly used facial landmark detection approaches?

If we'd generalize, could say facial features (or landmark) detection enables a number of use cases in HCI, entertainment (incl. shopping), medical, security surveillance, and more.

@tomoyukilabs
Copy link
Contributor Author

@zhiqiangyu Thanks. That use case looks good to me!

@anssiko @huningxin I have added a couple of use cases related to facial characteristics. PTAL.

Copy link
Member

@anssiko anssiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tomoyukilabs! LGTM.

@huningxin
Copy link
Contributor

Thanks @zhiqiangyu! The online shopping is definitely a key scenario for ML usage, e.g. CoverGirl for virtual makeup try on.

@tomoyukilabs , thanks for all the good work! The PR LGTM.

@anssiko
Copy link
Member

anssiko commented Dec 21, 2018

The CfC to adopt these use cases as a starting point for the API definition ended without concerns so we'll merge this PR. Huge thanks to all the contributors!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants