-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open source our Inference widgets #56
Comments
Is this discussion being held internally? Of course I would like it to be open source with the intent of creating custom demos. I also have a tangent question: This repo contains docker files for docker images for inference widget support of non-Transformers frameworks. Could the equivalent Transformers docker image be open sourced? Otherwise I would end up rewriting what already exists (even if it's quite simple). My instinct why this is not already the case is that you might want to keep the automatic ONNX conversion/usage closed source for monetization reasons, which is perfectly reasonable. But maybe you could add a simple equivalent Transformers image without ONNX. |
Discussion is open to external feedback – in fact it's appreciated. Upvote or comment this if you would like the widgets to be open sourced! FYI they're written in Svelte (https://svelte.dev) with Tailwind (https://tailwindcss.com/). I would love to gauge the interest of the community to write new widgets in Svelte/Tailwind. Re. the Inference API, yes we're not planning to release the images for transformers at the moment (cc @Narsil @jeffboudier). Note however that we do have a simple version of serving.py in the transformers repo: https://github.com/huggingface/transformers/blob/master/src/transformers/commands/serving.py |
Thanks! My exact intentions were to write my own open source widget with Svelte and Tailwind (or probably WindiCSS, which is lightweight Tailwind++). Let me know if there is any way I could contribute. I think that simple widgets are very useful for local testing (maybe in notebooks) and for simple self-hosted online demos. This is not currently possible for the HF widget (and Gradio is also not quite there). I find it remarkable that you (plural; Hugging Face) consistently seem to use every language/framework that I personally love in the complete software stack. |
haha that's awesome! Great minds share the same stack =) Re. WindiCSS, feel like its main advantage is JIT compilation which Tailwind now does too, no? cc @gary149 |
There are some other minor differences. About JIT: Tailwind now does JIT too, but WindiCSS does it a bit better. See here some details by Antfu: windicss/windicss#176 But in any case: vite + svelte + windicss = ridiculously fast + stable + simple + independent of the whims of commerical companies |
It could be done to some extent, however extracting the open from the private code might be a bit too much. |
@wietsedv out of curiosity what kind of widget/task are you thinking of building? (cc @LysandreJik) |
Not really one thing. For completeness sake I will tell you everything I want to make/use regarding inference/widgets:
I already have some (unusable) minimal POC for the crucial steps in the items above. Today I started setting up a small (usable) POC for the first item. |
This is really doable, but requires removing ONNX for production (on CPU) you are quite right, edge cases is mostly linked to generative models and using past values as cache for faster generation. And in general tuning the various knobs correctly. |
@Narsil Yes, the download speed is the biggest flaw in the wasm idea. A workaround would be to provide a hosted API as a backup, but in a real-world scenario downloading the model and running it in a browser does not add any user value. There are two actual reasons I would like to do it:
|
Small advice on device, run the native counterpart to ONNX, it's more likely to have better performance (on iPhone for instance, there's a separate chip for ML, and I don't think you can access it with onnx afaik) |
Thanks, that's a good point. You would not have access to many native APIs with if you make a web-based app with Capacitor, but that problem is already solved by the NativeScript Capacitor integration. At least with iOS devices you would want to use Core ML. Not sure for Android though. In case anyone is interested; yesterday I started a small side-project for a simple self-hosted API (just a tiny wrapper around transformers.pipelines for now) and a front-end (which is kind of a clone of your inference widget): https://github.com/wietsedv/pipelines For now, this will just be something for personal use. |
Returning to the original subject of this issue, this is being worked on in #87 |
cross-referencing issue on our internal repo: https://github.com/huggingface/moon-landing/issues/716
The text was updated successfully, but these errors were encountered: