Open source our Inference widgets #56

julien-c · 2021-05-27T09:13:52Z

cross-referencing issue on our internal repo: https://github.com/huggingface/moon-landing/issues/716

wietsedv · 2021-05-30T08:02:40Z

Is this discussion being held internally? Of course I would like it to be open source with the intent of creating custom demos.

I also have a tangent question: This repo contains docker files for docker images for inference widget support of non-Transformers frameworks. Could the equivalent Transformers docker image be open sourced? Otherwise I would end up rewriting what already exists (even if it's quite simple).

My instinct why this is not already the case is that you might want to keep the automatic ONNX conversion/usage closed source for monetization reasons, which is perfectly reasonable. But maybe you could add a simple equivalent Transformers image without ONNX.

julien-c · 2021-05-31T09:17:09Z

Discussion is open to external feedback – in fact it's appreciated. Upvote or comment this if you would like the widgets to be open sourced! FYI they're written in Svelte (https://svelte.dev) with Tailwind (https://tailwindcss.com/).

I would love to gauge the interest of the community to write new widgets in Svelte/Tailwind.

Re. the Inference API, yes we're not planning to release the images for transformers at the moment (cc @Narsil @jeffboudier). Note however that we do have a simple version of serving.py in the transformers repo: https://github.com/huggingface/transformers/blob/master/src/transformers/commands/serving.py

wietsedv · 2021-05-31T09:49:24Z

Thanks! My exact intentions were to write my own open source widget with Svelte and Tailwind (or probably WindiCSS, which is lightweight Tailwind++). Let me know if there is any way I could contribute. I think that simple widgets are very useful for local testing (maybe in notebooks) and for simple self-hosted online demos. This is not currently possible for the HF widget (and Gradio is also not quite there).

I find it remarkable that you (plural; Hugging Face) consistently seem to use every language/framework that I personally love in the complete software stack.

julien-c · 2021-05-31T09:54:39Z

haha that's awesome! Great minds share the same stack =)

Re. WindiCSS, feel like its main advantage is JIT compilation which Tailwind now does too, no? cc @gary149

wietsedv · 2021-05-31T10:11:19Z

There are some other minor differences. About JIT: Tailwind now does JIT too, but WindiCSS does it a bit better. See here some details by Antfu: windicss/windicss#176

But in any case: vite + svelte + windicss = ridiculously fast + stable + simple + independent of the whims of commerical companies

Narsil · 2021-05-31T13:29:01Z

It could be done to some extent, however extracting the open from the private code might be a bit too much.
Overall the API mimics very closely the pipelines part of transformers (some defaults are changed mostly) so we could have an open-source version. However since we wouldn't use that, I fear keeping it up-to date might be an issue.

julien-c · 2021-06-02T10:29:01Z

@wietsedv out of curiosity what kind of widget/task are you thinking of building? (cc @LysandreJik)

wietsedv · 2021-06-02T11:49:06Z

Not really one thing. For completeness sake I will tell you everything I want to make/use regarding inference/widgets:

I want to make a general stand-alone self-hosted demo GUI for using/testing models that is targeted to users
- The inference widgets on the model pages do not suffice, because everything surrounding the widget is targeted to researchers/developers instead of naive users who do not need to understand the underlying techniques
- Also, I want to be able to for instance upload/download spreadsheets with examples
I want to be able to make custom widgets with as little code as possible that are compatible with Transformers pipelines, but with a use-case that is too specific for inclusion in Transformers pipelines or the HF inference widgets
- The first specific example is that I want to make a visualization thingy for (a newer revision of) this paper: https://arxiv.org/abs/2011.12649 (acoustic distance measure with dynamic time-warping and feature-based use of Wav2Vec2)
I would like to use ONNX back-end, because I love PyTorch for development, but I do not really like it for production. Transformers PyTorch to ONNX conversion does not seem to be difficult, but I have no clue yet what edge-cases there are.
I want to experiment with doing everything above in the browser without any back-end. Most people have machines that are powerful enough for small-scale inference. This would include:
- Tokenization: Rust Tokenizers > WebAssembly (works perfectly except for the onig dependency which very inconveniently is a binding to the Oniguruma C library. Rust wasm-bindgen works perfectly for pure Rust. Have not yet attempted to solve this issue)
- Inference: ONNX.js (a quick test with a BERT-based model converted to ONNX works perfectly)
- Tie it together as a pipeline in Typescript (must make sure that this is as minimal as possible to make it low-maintenance)

I already have some (unusable) minimal POC for the crucial steps in the items above. Today I started setting up a small (usable) POC for the first item.

Narsil · 2021-06-03T09:10:31Z

@wietsedv

I want to experiment with doing everything above in the browser without any back-end. Most people have machines that are powerful enough for small-scale inference. This would include:

Tokenization: Rust Tokenizers > WebAssembly (works perfectly except for the onig dependency which very inconveniently is a binding to the Oniguruma C library. Rust wasm-bindgen works perfectly for pure Rust. Have not yet attempted to solve this issue)
Inference: ONNX.js (a quick test with a BERT-based model converted to ONNX works perfectly)
Tie it together as a pipeline in Typescript (must make sure that this is as minimal as possible to make it low-maintenance)

This is really doable, but requires removing onig from requirements as you mentionned. It will break some tokenizers, but most of them will work fine.The main issue, is the download time on the browser of the actual model (Anything > 50Mo is quite slow on most connections, but mileage may vary depending on your connection, but keep in mind, not everyone has fiber).

ONNX for production (on CPU) you are quite right, edge cases is mostly linked to generative models and using past values as cache for faster generation. And in general tuning the various knobs correctly.

wietsedv · 2021-06-03T09:59:03Z

@Narsil Yes, the download speed is the biggest flaw in the wasm idea. A workaround would be to provide a hosted API as a backup, but in a real-world scenario downloading the model and running it in a browser does not add any user value. There are two actual reasons I would like to do it:

It is conceptually awesome that it is possible (and relatively easy)
Embedding small models in apps for running on phones (Capacitor/Nativescript/PWA). The ONNX models have to be downloaded only once or can be bundled in the app.

Narsil · 2021-06-03T12:21:14Z

Small advice on device, run the native counterpart to ONNX, it's more likely to have better performance (on iPhone for instance, there's a separate chip for ML, and I don't think you can access it with onnx afaik)

wietsedv · 2021-06-03T13:14:49Z

Thanks, that's a good point. You would not have access to many native APIs with if you make a web-based app with Capacitor, but that problem is already solved by the NativeScript Capacitor integration. At least with iOS devices you would want to use Core ML. Not sure for Android though.

In case anyone is interested; yesterday I started a small side-project for a simple self-hosted API (just a tiny wrapper around transformers.pipelines for now) and a front-end (which is kind of a clone of your inference widget): https://github.com/wietsedv/pipelines For now, this will just be something for personal use.

julien-c · 2021-06-09T10:05:58Z

Returning to the original subject of this issue, this is being worked on in #87

julien-c added the discussion label May 27, 2021

julien-c added the widgets About our Inference widgets label Jun 2, 2021

julien-c linked a pull request Jun 11, 2021 that will close this issue

Open source Inference widgets + optimize for community contributions #87

Merged

julien-c closed this as completed in #87 Jun 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open source our Inference widgets #56

Open source our Inference widgets #56

julien-c commented May 27, 2021

wietsedv commented May 30, 2021 •

edited

Loading

julien-c commented May 31, 2021

wietsedv commented May 31, 2021

julien-c commented May 31, 2021

wietsedv commented May 31, 2021

Narsil commented May 31, 2021

julien-c commented Jun 2, 2021

wietsedv commented Jun 2, 2021

Narsil commented Jun 3, 2021

wietsedv commented Jun 3, 2021

Narsil commented Jun 3, 2021

wietsedv commented Jun 3, 2021

julien-c commented Jun 9, 2021

Open source our Inference widgets #56

Open source our Inference widgets #56

Comments

julien-c commented May 27, 2021

wietsedv commented May 30, 2021 • edited Loading

julien-c commented May 31, 2021

wietsedv commented May 31, 2021

julien-c commented May 31, 2021

wietsedv commented May 31, 2021

Narsil commented May 31, 2021

julien-c commented Jun 2, 2021

wietsedv commented Jun 2, 2021

Narsil commented Jun 3, 2021

wietsedv commented Jun 3, 2021

Narsil commented Jun 3, 2021

wietsedv commented Jun 3, 2021

julien-c commented Jun 9, 2021

wietsedv commented May 30, 2021 •

edited

Loading