Beckn-gemini bot enhancement - adding support for voice / video/ image input #115

emmayank · 2024-10-09T10:45:49Z

Description

Enhance the Beckn-Gemini Bot to support multimodal input types such as voice, video, and image. Since Gemini is designed to be multimodal, the bot should allow users to provide input in different formats during a conversation, enabling a more flexible and accessible user experience. At any point in the conversation, the bot should detect the input mode and respond accordingly. For example, when asking for a 6-digit OTP, the user may prefer to send a voice message instead of typing it, or users may ask for information via a voice message like "मैं सौर ऊर्जा खरीदना चाहता हूं"

Goals

Add support for voice input, allowing users to send voice messages instead of text.
Enable the bot to process video input (if applicable for specific use cases) and respond appropriately.
Integrate image recognition capabilities so the bot can understand and respond to images shared by the user (e.g., an image of a bill or document).
Allow users to seamlessly switch between input modes (text, voice, video, image) at any point during the conversation.
Implement detection for different input types and ensure the bot responds appropriately, regardless of the mode used.
Test across various scenarios, ensuring that the bot can handle and respond to voice, video, and image inputs accurately.

Expected Outcome

The bot will support multimodal inputs such as voice, video, and images, providing a flexible and accessible user experience.
Users can switch input modes (e.g., voice to text, image to voice) without breaking the flow of conversation.
The bot detects the input mode and responds appropriately to voice messages, video, or images.

Acceptance Criteria

The bot successfully detects and processes voice, video, and image inputs during the conversation.
Users can switch between input modes (voice, text, video, image) seamlessly during the conversation without any disruptions.
Voice input is correctly recognized and converted to actionable information (e.g., detecting OTP or voice-based requests like "मैं सौर ऊर्जा खरीदना चाहता हूं").
The functionality is tested across multiple scenarios and input types to ensure reliability and accuracy.

Mockups / Wireframes

NA

Product Name

Beckn-Gemini Bot

Domain

Multimodal AI / Conversational AI

Tech Skills Needed

Voice and Speech Recognition (NLP)
Image/Video Processing (Computer Vision)
Multimodal Input Integration
Chatbot Development

Complexity

High

Sub Category

Multimodal Input Support

Project View

Beckn-Gemini Bot

Project Name

Beckn-Gemini Bot Multimodal Enhancement

emmayank assigned shreyvishal Oct 9, 2024

emmayank added the enhancement New feature or request label Oct 9, 2024

emmayank changed the title ~~Beckn-gemini bot enhahcement - adding support for voice / video/ image input~~ Beckn-gemini bot enhancement - adding support for voice / video/ image input Oct 9, 2024

emmayank mentioned this issue Oct 9, 2024

Beckn-gemini-bot - Enhancements #111

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beckn-gemini bot enhancement - adding support for voice / video/ image input #115

Beckn-gemini bot enhancement - adding support for voice / video/ image input #115

emmayank commented Oct 9, 2024 •

edited

Loading

Beckn-gemini bot enhancement - adding support for voice / video/ image input #115

Beckn-gemini bot enhancement - adding support for voice / video/ image input #115

Comments

emmayank commented Oct 9, 2024 • edited Loading

Description

Goals

Expected Outcome

Acceptance Criteria

Mockups / Wireframes

Product Name

Domain

Tech Skills Needed

Complexity

Category

Sub Category

Project View

Project Name

emmayank commented Oct 9, 2024 •

edited

Loading