-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multimodal Input TextBox #4668
Comments
I don't know if we'd want a full rich text input like slack but something like discord or WhatsApp might be nice. The ability to upload text along with various media (images, audio, video). |
@pngwn @taoari I've actually thought about this, and I think it's a good idea. Especially as more multimodal projects become popular, it would be good to have a component that supports them. I think making the chatbot a single component (input + output) also makes a lot of sense plus easier for devs to use. We can explore this in 4.0 |
@pngwn @dawoodkhan82 It's great to see this is on the wish list, looking forward to it. |
The rich textbox should be a separate component, particularly if we want to support files, as that would involve changing the API (we could do a similar tuple format to support files). |
Aside: it would be cool if the rich textbox could support text color so that we could address this feature request: #2303 |
@abidlabs we can allow file upload and the text styling features to be turned off for the rich textbox, in case a dev wants only one feature and not the other. |
Hey! We've now made it possible for Gradio users to create their own custom components -- meaning that you can write some Python and JavaScript (Svelte), and publish it as a Gradio component. You can use it in your own Gradio apps, or share it so that anyone can use it in their Gradio apps. Here are some examples of custom Gradio components:
You can see the source code for those components by clicking the "Files" icon and then clicking "src". The complete source code for the backend and frontend is visible. In particular, its very fast if you want to build off an existing component. We've put together a Guide: https://www.gradio.app/guides/five-minute-guide, and we're happy to help. Hopefully this will help address this issue. |
Closing this issue in favor of: #6976 |
Just FYI @taoari we now support a |
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Multimodal LLMs become popular nowadays. However, for multimodal input, the current gradio app has to use separate widgets for images, videos, audio, and files (attachments). The UI is super non-intuitive, it would be good to have a multimodal input textbox.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Any modern chat app has a multimodal input textbox, e.g. Slack, Teams, etc. The screenshot would be the Slack input box, it would be nice to has something similar.
Additional context
Add any other context or screenshots about the feature request here.
It would also be great that the gr.Chatbot can be updated accordingly that can show text, images, videos and attachments in a single message. The current version of gr.Chatbot only shows a single modality (text or image but not both in one message). PR467 #4667 is a bug that does show the file. It would also be great if the Chatbot can show 3D models, as there is a gr.Model3D component.
The text was updated successfully, but these errors were encountered: