Chat with Multiple Images. Support Vision with Gemini #942

debanjum · 2024-10-19T21:55:24Z

Overview

Add vision support for Gemini models in Khoj
Allow sharing multiple images as part of user query from the web app
Handle multiple images shared in query to chat API

Screenshots

Desktop	Mobile

Closes #716
Closes #921

Previously Khoj could respond to a single shared image at a time. This changes updates the chat API to accept multiple images shared by the user and send it to the appropriate chat actors including the openai response generation chat actor for getting an image aware response

Previously the web app only expected a single image to be shared by the user as part of their query. This change allows sharing multiple images from the web app. Closes #921

- Put the attached images display div inside the same parent div as the text area - Keep the attachment, microphone/send message buttons aligned with the text area. So the attached images just show up at the top of the text area but everything else stays at the same horizontal height as before. - This improves the UX by - Ensuring that the attached images do not obscure the agents pane above the chat input area - The attached images visually look like they are inside the actual input area, rather than floating above it. So the visual aligns with the semantics

Currently experiencing difficulty instruction following when an image is shared. It's more likely to try and output an image. Update to make a clearer distinction.

…e on the homage page One limitation of this methodology is that localStorage has a limit in how much data it can take. Should add more graceful error handling here as well.

sabaimran

🚀 awesome being able to chat w/ multiple images & having gemini support.

src/interface/web/app/components/chatInputArea/chatInputArea.module.css

src/interface/web/app/components/chatInputArea/chatInputArea.tsx

src/interface/web/app/components/chatMessage/chatMessage.tsx

src/khoj/routers/api_chat.py

@@ -1102,9 +1089,9 @@

    ## Stream Text Response
    if stream:
-        return StreamingResponse(event_generator(q, image=image), media_type="text/plain")
+        return StreamingResponse(event_generator(q, images=raw_images), media_type="text/plain")


To fix the problem, we need to ensure that detailed error messages and stack traces are not exposed to the end user. Instead, we should log the detailed error information on the server and return a generic error message to the user. This can be achieved by modifying the exception handling in src/khoj/processor/image/generate.py to yield a generic error message and status code, and ensuring that the event_generator function in src/khoj/routers/api_chat.py handles these generic messages appropriately.

Set max combined images size to 20mb to allow multiple photos to be shared

debanjum added 5 commits October 19, 2024 14:53

Allow sharing multiple images as part of user query from the web app

0d6a54c

Previously the web app only expected a single image to be shared by the user as part of their query. This change allows sharing multiple images from the web app. Closes #921

Add vision support for Gemini models in Khoj

3e39fac

Style user attached images with fixed height, in a single row on web app

3cc1426

debanjum force-pushed the multi-image-chat-and-vision-for-gemini branch from 7e5ed8a to 3cc1426 Compare October 19, 2024 23:48

sabaimran and others added 5 commits October 19, 2024 16:54

Remove unused icons in chatInputArea

545259e

Improve mode description given to LLM when determining how to respond.

cb6b3ec

Currently experiencing difficulty instruction following when an image is shared. It's more likely to try and output an image. Update to make a clearer distinction.

Move window redirect to after relevant data is dropped in localStorag…

1ad6e17

…e on the homage page One limitation of this methodology is that localStorage has a limit in how much data it can take. Should add more graceful error handling here as well.

Ensure images are reset after messages processed

5d5bea6

Style user attached images as carousel on chat input area of web app

7646ac6

debanjum force-pushed the multi-image-chat-and-vision-for-gemini branch 2 times, most recently from 5510f9a to 7646ac6 Compare October 20, 2024 16:12

debanjum requested a review from sabaimran October 21, 2024 23:55

sabaimran approved these changes Oct 22, 2024

View reviewed changes

debanjum added 2 commits October 22, 2024 04:37

Rate limit the count and total size of images shared via API

e8fb79a

Merge branch 'master' into multi-image-chat-and-vision-for-gemini

6c39380

github-advanced-security bot found potential problems Oct 23, 2024

View reviewed changes

Sanitize user attached images. Constrain chat input width on home page

b3fff43

Set max combined images size to 20mb to allow multiple photos to be shared

debanjum force-pushed the multi-image-chat-and-vision-for-gemini branch from 99d06fc to b3fff43 Compare October 23, 2024 02:43

debanjum merged commit c6f3253 into master Oct 23, 2024
9 checks passed

debanjum deleted the multi-image-chat-and-vision-for-gemini branch October 23, 2024 02:59

@@ -1085,5 +1085,5 @@
                                 yield result
-                        except Exception as e:
+                        except Exception:
                             continue_stream = False
-                            logger.info(f"User {user} disconnected. Emitting rest of responses to clear thread: {e}")
+                            logger.info(f"User {user} disconnected. Emitting rest of responses to clear thread.")

@@ -85,3 +85,3 @@
                             webp_image_bytes = generate_image_with_replicate(image_prompt, text_to_image_config, text2image_model)
-                    except openai.OpenAIError or openai.BadRequestError or openai.APIConnectionError as e:
+                    except (openai.OpenAIError, openai.BadRequestError, openai.APIConnectionError) as e:
                         if "content_policy_violation" in e.message:
@@ -89,4 +89,3 @@
                             status_code = e.status_code  # type: ignore
-                            message = f"Image generation blocked by OpenAI: {e.message}"  # type: ignore
-                            yield image_url or image, status_code, message, intent_type.value
+                            yield image_url or image, status_code, "Image generation blocked due to policy violation.", intent_type.value
                             return
@@ -94,5 +93,3 @@
                             logger.error(f"Image Generation failed with {e}", exc_info=True)
-                            message = f"Image generation failed with OpenAI error: {e.message}"  # type: ignore
-                            status_code = e.status_code  # type: ignore
-                            yield image_url or image, status_code, message, intent_type.value
+                            yield image_url or image, 500, "Image generation failed due to an internal error.", intent_type.value
                             return
@@ -100,5 +97,3 @@
                         logger.error(f"Image Generation failed with {e}", exc_info=True)
-                        message = f"Image generation using {text2image_model} via {text_to_image_config.model_type} failed with error: {e}"
-                        status_code = 502
-                        yield image_url or image, status_code, message, intent_type.value
+                        yield image_url or image, 502, "Image generation failed due to a network error.", intent_type.value
                         return

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat with Multiple Images. Support Vision with Gemini #942

Chat with Multiple Images. Support Vision with Gemini #942

debanjum commented Oct 19, 2024 •

edited

Loading

sabaimran left a comment

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Chat with Multiple Images. Support Vision with Gemini #942

Chat with Multiple Images. Support Vision with Gemini #942

Conversation

debanjum commented Oct 19, 2024 • edited Loading

Overview

Screenshots

sabaimran left a comment

Choose a reason for hiding this comment

debanjum commented Oct 19, 2024 •

edited

Loading