You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enhance the Beckn-Gemini Bot to support multimodal input types such as voice, video, and image. Since Gemini is designed to be multimodal, the bot should allow users to provide input in different formats during a conversation, enabling a more flexible and accessible user experience. At any point in the conversation, the bot should detect the input mode and respond accordingly. For example, when asking for a 6-digit OTP, the user may prefer to send a voice message instead of typing it, or users may ask for information via a voice message like "मैं सौर ऊर्जा खरीदना चाहता हूं"
Goals
Add support for voice input, allowing users to send voice messages instead of text.
Enable the bot to process video input (if applicable for specific use cases) and respond appropriately.
Integrate image recognition capabilities so the bot can understand and respond to images shared by the user (e.g., an image of a bill or document).
Allow users to seamlessly switch between input modes (text, voice, video, image) at any point during the conversation.
Implement detection for different input types and ensure the bot responds appropriately, regardless of the mode used.
Test across various scenarios, ensuring that the bot can handle and respond to voice, video, and image inputs accurately.
Expected Outcome
The bot will support multimodal inputs such as voice, video, and images, providing a flexible and accessible user experience.
Users can switch input modes (e.g., voice to text, image to voice) without breaking the flow of conversation.
The bot detects the input mode and responds appropriately to voice messages, video, or images.
Acceptance Criteria
The bot successfully detects and processes voice, video, and image inputs during the conversation.
Users can switch between input modes (voice, text, video, image) seamlessly during the conversation without any disruptions.
Voice input is correctly recognized and converted to actionable information (e.g., detecting OTP or voice-based requests like "मैं सौर ऊर्जा खरीदना चाहता हूं").
The functionality is tested across multiple scenarios and input types to ensure reliability and accuracy.
Mockups / Wireframes
NA
Product Name
Beckn-Gemini Bot
Domain
Multimodal AI / Conversational AI
Tech Skills Needed
Voice and Speech Recognition (NLP)
Image/Video Processing (Computer Vision)
Multimodal Input Integration
Chatbot Development
Complexity
High
Category
Bot Enhancement
Sub Category
Multimodal Input Support
Project View
Beckn-Gemini Bot
Project Name
Beckn-Gemini Bot Multimodal Enhancement
The text was updated successfully, but these errors were encountered:
emmayank
changed the title
Beckn-gemini bot enhahcement - adding support for voice / video/ image input
Beckn-gemini bot enhancement - adding support for voice / video/ image input
Oct 9, 2024
Description
Enhance the Beckn-Gemini Bot to support multimodal input types such as voice, video, and image. Since Gemini is designed to be multimodal, the bot should allow users to provide input in different formats during a conversation, enabling a more flexible and accessible user experience. At any point in the conversation, the bot should detect the input mode and respond accordingly. For example, when asking for a 6-digit OTP, the user may prefer to send a voice message instead of typing it, or users may ask for information via a voice message like "मैं सौर ऊर्जा खरीदना चाहता हूं"
Goals
Expected Outcome
Acceptance Criteria
Mockups / Wireframes
NA
Product Name
Beckn-Gemini Bot
Domain
Multimodal AI / Conversational AI
Tech Skills Needed
Complexity
High
Category
Bot Enhancement
Sub Category
Multimodal Input Support
Project View
Beckn-Gemini Bot
Project Name
Beckn-Gemini Bot Multimodal Enhancement
The text was updated successfully, but these errors were encountered: