Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat with Multiple Images. Support Vision with Gemini #942

Merged
merged 13 commits into from
Oct 23, 2024

Conversation

debanjum
Copy link
Member

@debanjum debanjum commented Oct 19, 2024

Overview

  • Add vision support for Gemini models in Khoj
  • Allow sharing multiple images as part of user query from the web app
  • Handle multiple images shared in query to chat API

Screenshots

Desktop Mobile
Chat with Multiple Images - Desktop Chat with Multiple Images - Mobile

Closes #716
Closes #921

Previously Khoj could respond to a single shared image at a time.

This changes updates the chat API to accept multiple images shared by
the user and send it to the appropriate chat actors including the
openai response generation chat actor for getting an image aware
response
Previously the web app only expected a single image to be shared by
the user as part of their query.

This change allows sharing multiple images from the web app.

Closes #921
- Put the attached images display div inside the same parent div as
  the text area
- Keep the attachment, microphone/send message buttons aligned with
  the text area. So the attached images just show up at the top of the
  text area but everything else stays at the same horizontal height as
  before.

- This improves the UX by
  - Ensuring that the attached images do not obscure the agents pane
    above the chat input area
  - The attached images visually look like they are inside the actual
    input area, rather than floating above it. So the visual aligns
    with the semantics
@debanjum debanjum force-pushed the multi-image-chat-and-vision-for-gemini branch from 7e5ed8a to 3cc1426 Compare October 19, 2024 23:48
sabaimran and others added 5 commits October 19, 2024 16:54
Currently experiencing difficulty instruction following when an image is shared. It's more likely to try and output an image. Update to make a clearer distinction.
…e on the homage page

One limitation of this methodology is that localStorage has a limit in how much data it can take. Should add more graceful error handling here as well.
@debanjum debanjum force-pushed the multi-image-chat-and-vision-for-gemini branch 2 times, most recently from 5510f9a to 7646ac6 Compare October 20, 2024 16:12
@debanjum debanjum requested a review from sabaimran October 21, 2024 23:55
Copy link
Member

@sabaimran sabaimran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 awesome being able to chat w/ multiple images & having gemini support.

@@ -1102,9 +1089,9 @@

## Stream Text Response
if stream:
return StreamingResponse(event_generator(q, image=image), media_type="text/plain")
return StreamingResponse(event_generator(q, images=raw_images), media_type="text/plain")

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.
Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix AI 4 months ago

To fix the problem, we need to ensure that detailed error messages and stack traces are not exposed to the end user. Instead, we should log the detailed error information on the server and return a generic error message to the user. This can be achieved by modifying the exception handling in src/khoj/processor/image/generate.py to yield a generic error message and status code, and ensuring that the event_generator function in src/khoj/routers/api_chat.py handles these generic messages appropriately.

Suggested changeset 2
src/khoj/routers/api_chat.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/khoj/routers/api_chat.py b/src/khoj/routers/api_chat.py
--- a/src/khoj/routers/api_chat.py
+++ b/src/khoj/routers/api_chat.py
@@ -1085,5 +1085,5 @@
                     yield result
-            except Exception as e:
+            except Exception:
                 continue_stream = False
-                logger.info(f"User {user} disconnected. Emitting rest of responses to clear thread: {e}")
+                logger.info(f"User {user} disconnected. Emitting rest of responses to clear thread.")
 
EOF
@@ -1085,5 +1085,5 @@
yield result
except Exception as e:
except Exception:
continue_stream = False
logger.info(f"User {user} disconnected. Emitting rest of responses to clear thread: {e}")
logger.info(f"User {user} disconnected. Emitting rest of responses to clear thread.")

src/khoj/processor/image/generate.py
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/khoj/processor/image/generate.py b/src/khoj/processor/image/generate.py
--- a/src/khoj/processor/image/generate.py
+++ b/src/khoj/processor/image/generate.py
@@ -85,3 +85,3 @@
                 webp_image_bytes = generate_image_with_replicate(image_prompt, text_to_image_config, text2image_model)
-        except openai.OpenAIError or openai.BadRequestError or openai.APIConnectionError as e:
+        except (openai.OpenAIError, openai.BadRequestError, openai.APIConnectionError) as e:
             if "content_policy_violation" in e.message:
@@ -89,4 +89,3 @@
                 status_code = e.status_code  # type: ignore
-                message = f"Image generation blocked by OpenAI: {e.message}"  # type: ignore
-                yield image_url or image, status_code, message, intent_type.value
+                yield image_url or image, status_code, "Image generation blocked due to policy violation.", intent_type.value
                 return
@@ -94,5 +93,3 @@
                 logger.error(f"Image Generation failed with {e}", exc_info=True)
-                message = f"Image generation failed with OpenAI error: {e.message}"  # type: ignore
-                status_code = e.status_code  # type: ignore
-                yield image_url or image, status_code, message, intent_type.value
+                yield image_url or image, 500, "Image generation failed due to an internal error.", intent_type.value
                 return
@@ -100,5 +97,3 @@
             logger.error(f"Image Generation failed with {e}", exc_info=True)
-            message = f"Image generation using {text2image_model} via {text_to_image_config.model_type} failed with error: {e}"
-            status_code = 502
-            yield image_url or image, status_code, message, intent_type.value
+            yield image_url or image, 502, "Image generation failed due to a network error.", intent_type.value
             return
EOF
@@ -85,3 +85,3 @@
webp_image_bytes = generate_image_with_replicate(image_prompt, text_to_image_config, text2image_model)
except openai.OpenAIError or openai.BadRequestError or openai.APIConnectionError as e:
except (openai.OpenAIError, openai.BadRequestError, openai.APIConnectionError) as e:
if "content_policy_violation" in e.message:
@@ -89,4 +89,3 @@
status_code = e.status_code # type: ignore
message = f"Image generation blocked by OpenAI: {e.message}" # type: ignore
yield image_url or image, status_code, message, intent_type.value
yield image_url or image, status_code, "Image generation blocked due to policy violation.", intent_type.value
return
@@ -94,5 +93,3 @@
logger.error(f"Image Generation failed with {e}", exc_info=True)
message = f"Image generation failed with OpenAI error: {e.message}" # type: ignore
status_code = e.status_code # type: ignore
yield image_url or image, status_code, message, intent_type.value
yield image_url or image, 500, "Image generation failed due to an internal error.", intent_type.value
return
@@ -100,5 +97,3 @@
logger.error(f"Image Generation failed with {e}", exc_info=True)
message = f"Image generation using {text2image_model} via {text_to_image_config.model_type} failed with error: {e}"
status_code = 502
yield image_url or image, status_code, message, intent_type.value
yield image_url or image, 502, "Image generation failed due to a network error.", intent_type.value
return
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Set max combined images size to 20mb to allow multiple photos to be shared
@debanjum debanjum force-pushed the multi-image-chat-and-vision-for-gemini branch from 99d06fc to b3fff43 Compare October 23, 2024 02:43
@debanjum debanjum merged commit c6f3253 into master Oct 23, 2024
9 checks passed
@debanjum debanjum deleted the multi-image-chat-and-vision-for-gemini branch October 23, 2024 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FIX] khoj vision mode do not support multiple images Incorporate Gemini & Gemini Vision support
2 participants