-
-
Notifications
You must be signed in to change notification settings - Fork 851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Feature: Vision Support for Khoj #889
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. Still need to test it out locally but this looks great.
src/interface/web/app/components/chatInputArea/chatInputArea.tsx
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. But excited to get vision support in Khoj soon!
please upgrade and merge to master asap |
…hatml construction
…images for pending msgs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested out the new vision support. Feels neat to be able to talk about visuals with Khoj!
Some general feedback on UX and response generation:
- When requesting a vision enabled response from web app's home screen, the response generation UX doesn't make it obvious that Khoj understood and is responding to that image
- The image attached to the chat message doesn't seem to be passed to Khoj's train of thought? This would reduce response quality and make the train of thought UX confusing (as Khoj doesn't acknowledge its using the image to generate it's response)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look great, seems ready to merge?
This should speed up image loading and reduce storage costs
The image itself isn't strictly required to infer output mode and data sources to reference to generate chat response So only share placeholder text for attached image rather than the actual image itself with those chat actors. This should reduce response latency and consume less tokens. - Minor - Reorder passing uploaded_image_url arg closer to query for better code readability
The /api/chat API endpoint has been updated to a POST endpoint from a GET endpoint to support passing attached images from client
- Align rendering uploaded images to previous HTML DOM structure of chat messages - Support adding image attached to chat message by user or khoj to clipboard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find, fixing the clients from GET
-> POST
.
✨ Summary of Changes
vision_enabled
option in server admin panel while configuring models👁️ Demo Images
🛠️ Feedback