Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open source our Inference widgets #56

Closed
julien-c opened this issue May 27, 2021 · 13 comments · Fixed by #87
Closed

Open source our Inference widgets #56

julien-c opened this issue May 27, 2021 · 13 comments · Fixed by #87
Labels
discussion widgets About our Inference widgets

Comments

@julien-c
Copy link
Member

cross-referencing issue on our internal repo: https://github.com/huggingface/moon-landing/issues/716

@wietsedv
Copy link

wietsedv commented May 30, 2021

Is this discussion being held internally? Of course I would like it to be open source with the intent of creating custom demos.

I also have a tangent question: This repo contains docker files for docker images for inference widget support of non-Transformers frameworks. Could the equivalent Transformers docker image be open sourced? Otherwise I would end up rewriting what already exists (even if it's quite simple).

My instinct why this is not already the case is that you might want to keep the automatic ONNX conversion/usage closed source for monetization reasons, which is perfectly reasonable. But maybe you could add a simple equivalent Transformers image without ONNX.

@julien-c
Copy link
Member Author

Discussion is open to external feedback – in fact it's appreciated. Upvote or comment this if you would like the widgets to be open sourced! FYI they're written in Svelte (https://svelte.dev) with Tailwind (https://tailwindcss.com/).

I would love to gauge the interest of the community to write new widgets in Svelte/Tailwind.

Re. the Inference API, yes we're not planning to release the images for transformers at the moment (cc @Narsil @jeffboudier). Note however that we do have a simple version of serving.py in the transformers repo: https://github.com/huggingface/transformers/blob/master/src/transformers/commands/serving.py

@wietsedv
Copy link

Thanks! My exact intentions were to write my own open source widget with Svelte and Tailwind (or probably WindiCSS, which is lightweight Tailwind++). Let me know if there is any way I could contribute. I think that simple widgets are very useful for local testing (maybe in notebooks) and for simple self-hosted online demos. This is not currently possible for the HF widget (and Gradio is also not quite there).

I find it remarkable that you (plural; Hugging Face) consistently seem to use every language/framework that I personally love in the complete software stack.

@julien-c
Copy link
Member Author

haha that's awesome! Great minds share the same stack =)

Re. WindiCSS, feel like its main advantage is JIT compilation which Tailwind now does too, no? cc @gary149

@wietsedv
Copy link

There are some other minor differences. About JIT: Tailwind now does JIT too, but WindiCSS does it a bit better. See here some details by Antfu: windicss/windicss#176

But in any case: vite + svelte + windicss = ridiculously fast + stable + simple + independent of the whims of commerical companies

@Narsil
Copy link
Contributor

Narsil commented May 31, 2021

It could be done to some extent, however extracting the open from the private code might be a bit too much.
Overall the API mimics very closely the pipelines part of transformers (some defaults are changed mostly) so we could have an open-source version. However since we wouldn't use that, I fear keeping it up-to date might be an issue.

@julien-c
Copy link
Member Author

julien-c commented Jun 2, 2021

@wietsedv out of curiosity what kind of widget/task are you thinking of building? (cc @LysandreJik)

@julien-c julien-c added the widgets About our Inference widgets label Jun 2, 2021
@wietsedv
Copy link

wietsedv commented Jun 2, 2021

Not really one thing. For completeness sake I will tell you everything I want to make/use regarding inference/widgets:

  • I want to make a general stand-alone self-hosted demo GUI for using/testing models that is targeted to users
    • The inference widgets on the model pages do not suffice, because everything surrounding the widget is targeted to researchers/developers instead of naive users who do not need to understand the underlying techniques
    • Also, I want to be able to for instance upload/download spreadsheets with examples
  • I want to be able to make custom widgets with as little code as possible that are compatible with Transformers pipelines, but with a use-case that is too specific for inclusion in Transformers pipelines or the HF inference widgets
    • The first specific example is that I want to make a visualization thingy for (a newer revision of) this paper: https://arxiv.org/abs/2011.12649 (acoustic distance measure with dynamic time-warping and feature-based use of Wav2Vec2)
  • I would like to use ONNX back-end, because I love PyTorch for development, but I do not really like it for production. Transformers PyTorch to ONNX conversion does not seem to be difficult, but I have no clue yet what edge-cases there are.
  • I want to experiment with doing everything above in the browser without any back-end. Most people have machines that are powerful enough for small-scale inference. This would include:
    • Tokenization: Rust Tokenizers > WebAssembly (works perfectly except for the onig dependency which very inconveniently is a binding to the Oniguruma C library. Rust wasm-bindgen works perfectly for pure Rust. Have not yet attempted to solve this issue)
    • Inference: ONNX.js (a quick test with a BERT-based model converted to ONNX works perfectly)
    • Tie it together as a pipeline in Typescript (must make sure that this is as minimal as possible to make it low-maintenance)

I already have some (unusable) minimal POC for the crucial steps in the items above. Today I started setting up a small (usable) POC for the first item.

@Narsil
Copy link
Contributor

Narsil commented Jun 3, 2021

@wietsedv

I want to experiment with doing everything above in the browser without any back-end. Most people have machines that are powerful enough for small-scale inference. This would include:

Tokenization: Rust Tokenizers > WebAssembly (works perfectly except for the onig dependency which very inconveniently is a binding to the Oniguruma C library. Rust wasm-bindgen works perfectly for pure Rust. Have not yet attempted to solve this issue)
Inference: ONNX.js (a quick test with a BERT-based model converted to ONNX works perfectly)
Tie it together as a pipeline in Typescript (must make sure that this is as minimal as possible to make it low-maintenance)

This is really doable, but requires removing onig from requirements as you mentionned. It will break some tokenizers, but most of them will work fine.The main issue, is the download time on the browser of the actual model (Anything > 50Mo is quite slow on most connections, but mileage may vary depending on your connection, but keep in mind, not everyone has fiber).

ONNX for production (on CPU) you are quite right, edge cases is mostly linked to generative models and using past values as cache for faster generation. And in general tuning the various knobs correctly.

@wietsedv
Copy link

wietsedv commented Jun 3, 2021

@Narsil Yes, the download speed is the biggest flaw in the wasm idea. A workaround would be to provide a hosted API as a backup, but in a real-world scenario downloading the model and running it in a browser does not add any user value. There are two actual reasons I would like to do it:

  • It is conceptually awesome that it is possible (and relatively easy)
  • Embedding small models in apps for running on phones (Capacitor/Nativescript/PWA). The ONNX models have to be downloaded only once or can be bundled in the app.

@Narsil
Copy link
Contributor

Narsil commented Jun 3, 2021

Small advice on device, run the native counterpart to ONNX, it's more likely to have better performance (on iPhone for instance, there's a separate chip for ML, and I don't think you can access it with onnx afaik)

@wietsedv
Copy link

wietsedv commented Jun 3, 2021

Thanks, that's a good point. You would not have access to many native APIs with if you make a web-based app with Capacitor, but that problem is already solved by the NativeScript Capacitor integration. At least with iOS devices you would want to use Core ML. Not sure for Android though.

In case anyone is interested; yesterday I started a small side-project for a simple self-hosted API (just a tiny wrapper around transformers.pipelines for now) and a front-end (which is kind of a clone of your inference widget): https://github.com/wietsedv/pipelines For now, this will just be something for personal use.

@julien-c
Copy link
Member Author

julien-c commented Jun 9, 2021

Returning to the original subject of this issue, this is being worked on in #87

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion widgets About our Inference widgets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants