Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature: Vision Support for Khoj #889

Merged
merged 37 commits into from
Sep 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
8ea0d8d
vision ui and s3 image upload complete
MythicalCow Aug 13, 2024
9cddfd8
gpt-4o vision support complete. chatml upgrades
MythicalCow Aug 13, 2024
48ad012
updated messages to use image_url instead of base64
MythicalCow Aug 14, 2024
93a680a
fixed bug where image continues to be rendered after message is sent
MythicalCow Aug 14, 2024
83740ee
removed image
MythicalCow Aug 14, 2024
41535d2
changed s3 bucket to env variable
MythicalCow Aug 14, 2024
737e9bd
database change to have vision_enabled parameter for models
MythicalCow Aug 14, 2024
cc0b716
code clean up part 1
MythicalCow Aug 14, 2024
0c59f59
code clean up part 3
MythicalCow Aug 14, 2024
f563c0f
cleaned up user image upload post request
MythicalCow Aug 14, 2024
d73cf35
bug fix for vision issues during new conversation creation
MythicalCow Aug 14, 2024
cee39bb
share chat changes. testing required
MythicalCow Aug 15, 2024
6bf706c
PR changes
MythicalCow Aug 16, 2024
d1815a9
removing unnecessary dependencies
MythicalCow Aug 16, 2024
806ff60
removing unnecessary dependency
MythicalCow Aug 16, 2024
8a6f6d8
Merge branch 'khoj-ai:master' into vision-support
MythicalCow Aug 16, 2024
33abce6
db migration
MythicalCow Aug 16, 2024
4eeee0a
tightening CSP
MythicalCow Aug 16, 2024
bb6f78a
migration
MythicalCow Aug 16, 2024
f1dfdc6
Merge branch 'master' of github.com:khoj-ai/khoj into vision-support
sabaimran Sep 5, 2024
ced1f35
Update yarn.lock
sabaimran Sep 5, 2024
b483700
Handle case where image upload bucket is not configured
sabaimran Sep 5, 2024
76351eb
Include image url if included in previous user messages and improve c…
sabaimran Sep 5, 2024
bd1f5a8
Add a merge migration to resolve with master
sabaimran Sep 5, 2024
294c068
Render previously uploaded images in the chat history, show uploaded …
sabaimran Sep 5, 2024
d2f9c3e
add list typing in merge migrations file
sabaimran Sep 5, 2024
ed502e8
Revert yarn.lock file to use yarnpkg.com
sabaimran Sep 6, 2024
ff7adbb
Revert shadcn package version to 0.8.0
sabaimran Sep 6, 2024
d4ee1d3
Pass the uploaded_image_url through to subqueries
sabaimran Sep 7, 2024
a52e958
Allow image to render upon first message from the homepage
sabaimran Sep 7, 2024
06682e0
Add rendering support for images to shared chat as well
sabaimran Sep 7, 2024
ccebf8b
Fix some UI/functionality bugs in the share page
sabaimran Sep 7, 2024
3e02cbf
Remove unnecessary packages added in package.json
sabaimran Sep 7, 2024
95b2adc
Convert user attached images for chat to webp format before upload
debanjum Sep 9, 2024
d689ac8
Use placeholder to attached image for data source, response mode actors
debanjum Sep 9, 2024
b7d9da7
Update all clients to call /api/chat as a POST instead of GET request
debanjum Sep 9, 2024
454e914
Fix copying chat messages with images to clipboard
debanjum Sep 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/interface/desktop/chat.html
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@
? `&region=${region}&city=${city}&country=${countryName}&timezone=${timezone}`
: '';

const response = await fetch(chatApi, { headers });
const response = await fetch(chatApi, { method: 'POST', headers });

try {
if (!response.ok) throw new Error(response.statusText);
Expand Down
2 changes: 1 addition & 1 deletion src/interface/desktop/shortcut.html
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,7 @@
? `&region=${region}&city=${city}&country=${countryName}&timezone=${timezone}`
: '';

const response = await fetch(chatApi, { headers });
const response = await fetch(chatApi, { method: 'POST', headers });

try {
if (!response.ok) throw new Error(response.statusText);
Expand Down
2 changes: 1 addition & 1 deletion src/interface/emacs/khoj.el
Original file line number Diff line number Diff line change
Expand Up @@ -878,7 +878,7 @@ Call CALLBACK func with response and CBARGS."
(let ((params `(("q" ,query) ("n" ,khoj-results-count))))
(when session-id (push `("conversation_id" ,session-id) params))
(khoj--call-api-async "/api/chat"
"GET"
"POST"
params
callback cbargs)))

Expand Down
4 changes: 2 additions & 2 deletions src/interface/obsidian/src/chat_view.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1074,9 +1074,9 @@ export class KhojChatView extends KhojPaneView {
};

let response = await fetch(chatUrl, {
method: "GET",
method: "POST",
headers: {
"Content-Type": "text/plain",
"Content-Type": "application/json",
"Authorization": `Bearer ${this.setting.khojApiKey}`,
},
})
Expand Down
4 changes: 2 additions & 2 deletions src/interface/web/app/chat/layout.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ export default function RootLayout({
content="default-src 'self' https://assets.khoj.dev;
media-src * blob:;
script-src 'self' https://assets.khoj.dev 'unsafe-inline' 'unsafe-eval';
connect-src 'self' https://ipapi.co/json ws://localhost:42110;
connect-src 'self' blob: https://ipapi.co/json ws://localhost:42110;
style-src 'self' https://assets.khoj.dev 'unsafe-inline' https://fonts.googleapis.com;
img-src 'self' data: https://*.khoj.dev https://*.googleusercontent.com https://*.google.com/ https://*.gstatic.com;
img-src 'self' data: blob: https://*.khoj.dev https://*.googleusercontent.com https://*.google.com/ https://*.gstatic.com;
font-src 'self' https://assets.khoj.dev https://fonts.gstatic.com;
child-src 'none';
object-src 'none';"
Expand Down
29 changes: 28 additions & 1 deletion src/interface/web/app/chat/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -27,19 +27,34 @@ interface ChatBodyDataProps {
setUploadedFiles: (files: string[]) => void;
isMobileWidth?: boolean;
isLoggedIn: boolean;
setImage64: (image64: string) => void;
}

function ChatBodyData(props: ChatBodyDataProps) {
const searchParams = useSearchParams();
const conversationId = searchParams.get("conversationId");
const [message, setMessage] = useState("");
const [image, setImage] = useState<string | null>(null);
const [processingMessage, setProcessingMessage] = useState(false);
const [agentMetadata, setAgentMetadata] = useState<AgentData | null>(null);

const setQueryToProcess = props.setQueryToProcess;
const onConversationIdChange = props.onConversationIdChange;

useEffect(() => {
if (image) {
props.setImage64(encodeURIComponent(image));
}
}, [image, props.setImage64]);

useEffect(() => {
const storedImage = localStorage.getItem("image");
if (storedImage) {
setImage(storedImage);
props.setImage64(encodeURIComponent(storedImage));
localStorage.removeItem("image");
}

const storedMessage = localStorage.getItem("message");
if (storedMessage) {
setProcessingMessage(true);
Expand Down Expand Up @@ -95,6 +110,7 @@ function ChatBodyData(props: ChatBodyDataProps) {
agentColor={agentMetadata?.color}
isLoggedIn={props.isLoggedIn}
sendMessage={(message) => setMessage(message)}
sendImage={(image) => setImage(image)}
sendDisabled={processingMessage}
chatOptionsData={props.chatOptionsData}
conversationId={conversationId}
Expand All @@ -116,6 +132,7 @@ export default function Chat() {
const [queryToProcess, setQueryToProcess] = useState<string>("");
const [processQuerySignal, setProcessQuerySignal] = useState(false);
const [uploadedFiles, setUploadedFiles] = useState<string[]>([]);
const [image64, setImage64] = useState<string>("");
const locationData = useIPLocationData();
const authenticatedData = useAuthenticatedData();
const isMobileWidth = useIsMobileWidth();
Expand Down Expand Up @@ -148,6 +165,7 @@ export default function Chat() {
completed: false,
timestamp: new Date().toISOString(),
rawQuery: queryToProcess || "",
uploadedImageData: decodeURIComponent(image64),
sabaimran marked this conversation as resolved.
Show resolved Hide resolved
};
setMessages((prevMessages) => [...prevMessages, newStreamMessage]);
setProcessQuerySignal(true);
Expand Down Expand Up @@ -178,6 +196,7 @@ export default function Chat() {
if (done) {
setQueryToProcess("");
setProcessQuerySignal(false);
setImage64("");
break;
}

Expand Down Expand Up @@ -218,7 +237,14 @@ export default function Chat() {
chatAPI += `&region=${locationData.region}&country=${locationData.country}&city=${locationData.city}&timezone=${locationData.timezone}`;
}

const response = await fetch(chatAPI);
const response = await fetch(chatAPI, {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: image64 ? JSON.stringify({ image: image64 }) : undefined,
});

try {
await readChatStream(response);
} catch (err) {
Expand Down Expand Up @@ -282,6 +308,7 @@ export default function Chat() {
setUploadedFiles={setUploadedFiles}
isMobileWidth={isMobileWidth}
onConversationIdChange={handleConversationIdChange}
setImage64={setImage64}
/>
</Suspense>
</div>
Expand Down
2 changes: 2 additions & 0 deletions src/interface/web/app/components/chatHistory/chatHistory.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,7 @@ export default function ChatHistory(props: ChatHistoryProps) {
created: message.timestamp,
by: "you",
automationId: "",
uploadedImageData: message.uploadedImageData,
}}
customClassName="fullHistory"
borderLeftColor={`${data?.agent.color}-500`}
Expand Down Expand Up @@ -309,6 +310,7 @@ export default function ChatHistory(props: ChatHistoryProps) {
created: new Date().getTime().toString(),
by: "you",
automationId: "",
uploadedImageData: props.pendingMessage,
}}
customClassName="fullHistory"
borderLeftColor={`${data?.agent.color}-500`}
Expand Down
61 changes: 60 additions & 1 deletion src/interface/web/app/components/chatInputArea/chatInputArea.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import {
Microphone,
Notebook,
Paperclip,
X,
Question,
Robot,
Shapes,
Expand Down Expand Up @@ -55,6 +56,7 @@ export interface ChatOptions {

interface ChatInputProps {
sendMessage: (message: string) => void;
sendImage: (image: string) => void;
sendDisabled: boolean;
setUploadedFiles?: (files: string[]) => void;
conversationId?: string | null;
Expand All @@ -75,6 +77,9 @@ export default function ChatInputArea(props: ChatInputProps) {
const [showLoginPrompt, setShowLoginPrompt] = useState(false);

const [recording, setRecording] = useState(false);
const [imageUploaded, setImageUploaded] = useState(false);
const [imagePath, setImagePath] = useState<string | null>(null);
const [imageData, setImageData] = useState<string | null>(null);
const [mediaRecorder, setMediaRecorder] = useState<MediaRecorder | null>(null);

const [progressValue, setProgressValue] = useState(0);
Expand All @@ -97,7 +102,30 @@ export default function ChatInputArea(props: ChatInputProps) {
}
}, [uploading]);

useEffect(() => {
async function fetchImageData() {
if (imagePath) {
const response = await fetch(imagePath);
const blob = await response.blob();
const reader = new FileReader();
reader.onload = function () {
const base64data = reader.result;
setImageData(base64data as string);
};
reader.readAsDataURL(blob);
}
setUploading(false);
}
setUploading(true);
fetchImageData();
}, [imagePath]);

function onSendMessage() {
if (imageUploaded) {
setImageUploaded(false);
setImagePath(null);
props.sendImage(imageData || "");
}
if (!message.trim()) return;

if (!props.isLoggedIn) {
Expand Down Expand Up @@ -142,6 +170,17 @@ export default function ChatInputArea(props: ChatInputProps) {
setShowLoginPrompt(true);
return;
}
// check for image file
const image_endings = ["jpg", "jpeg", "png"];
for (let i = 0; i < files.length; i++) {
const file = files[i];
const file_extension = file.name.split(".").pop();
if (image_endings.includes(file_extension || "")) {
setImageUploaded(true);
setImagePath(URL.createObjectURL(file));
return;
}
}

uploadDataForIndexing(
files,
Expand Down Expand Up @@ -287,6 +326,11 @@ export default function ChatInputArea(props: ChatInputProps) {
setIsDragAndDropping(false);
}

function removeImageUpload() {
setImageUploaded(false);
setImagePath(null);
}

return (
<>
{showLoginPrompt && loginRedirectMessage && (
Expand Down Expand Up @@ -397,11 +441,24 @@ export default function ChatInputArea(props: ChatInputProps) {
</div>
)}
<div
className={`${styles.actualInputArea} items-center justify-between dark:bg-neutral-700`}
className={`${styles.actualInputArea} items-center justify-between dark:bg-neutral-700 relative`}
onDragOver={handleDragOver}
onDragLeave={handleDragLeave}
onDrop={handleDragAndDropFiles}
>
{imageUploaded && (
<div className="absolute bottom-[80px] left-0 right-0 dark:bg-neutral-700 bg-white pt-5 pb-5 w-full rounded-lg border dark:border-none grid grid-cols-2">
<div className="pl-4 pr-4">
<img src={imagePath || ""} alt="img" className="w-auto max-h-[100px]" />
</div>
<div className="pl-4 pr-4">
<X
className="w-6 h-6 float-right dark:hover:bg-[hsl(var(--background))] hover:bg-neutral-100 rounded-sm"
onClick={removeImageUpload}
/>
</div>
</div>
)}
<input
type="file"
multiple={true}
Expand All @@ -427,6 +484,8 @@ export default function ChatInputArea(props: ChatInputProps) {
value={message}
onKeyDown={(e) => {
if (e.key === "Enter" && !e.shiftKey) {
setImageUploaded(false);
setImagePath(null);
e.preventDefault();
onSendMessage();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,11 @@ div.chatMessageContainer h3 img {
width: 24px;
}

div.you img {
height: 16rem;
width: auto;
}

div.you {
color: hsla(var(--secondary-foreground));
}
Expand Down
14 changes: 12 additions & 2 deletions src/interface/web/app/components/chatMessage/chatMessage.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ export interface SingleChatMessage {
rawQuery?: string;
intent?: Intent;
agent?: AgentData;
uploadedImageData?: string;
}

export interface StreamMessage {
Expand All @@ -122,6 +123,7 @@ export interface StreamMessage {
rawQuery: string;
timestamp: string;
agent?: AgentData;
uploadedImageData?: string;
}

export interface ChatHistoryData {
Expand Down Expand Up @@ -203,6 +205,7 @@ interface ChatMessageProps {
borderLeftColor?: string;
isLastMessage?: boolean;
agent?: AgentData;
uploadedImageData?: string;
}

interface TrainOfThoughtProps {
Expand Down Expand Up @@ -273,6 +276,7 @@ export function TrainOfThought(props: TrainOfThoughtProps) {
export default function ChatMessage(props: ChatMessageProps) {
const [copySuccess, setCopySuccess] = useState<boolean>(false);
const [isHovering, setIsHovering] = useState<boolean>(false);
const [textRendered, setTextRendered] = useState<string>("");
const [markdownRendered, setMarkdownRendered] = useState<string>("");
const [isPlaying, setIsPlaying] = useState<boolean>(false);
const [interrupted, setInterrupted] = useState<boolean>(false);
Expand Down Expand Up @@ -322,6 +326,10 @@ export default function ChatMessage(props: ChatMessageProps) {
.replace(/\\\[/g, "LEFTBRACKET")
.replace(/\\\]/g, "RIGHTBRACKET");

if (props.chatMessage.uploadedImageData) {
message = `![uploaded image](${props.chatMessage.uploadedImageData})\n\n${message}`;
}

if (props.chatMessage.intent && props.chatMessage.intent.type == "text-to-image") {
message = `![generated image](data:image/png;base64,${message})`;
} else if (props.chatMessage.intent && props.chatMessage.intent.type == "text-to-image2") {
Expand All @@ -340,6 +348,9 @@ export default function ChatMessage(props: ChatMessageProps) {
message += `\n\n**Inferred Query**\n\n${props.chatMessage.intent["inferred-queries"][0]}`;
}

setTextRendered(message);

// Render the markdown
let markdownRendered = md.render(message);

// Replace placeholders with LaTeX delimiters
Expand Down Expand Up @@ -542,7 +553,6 @@ export default function ChatMessage(props: ChatMessageProps) {
className={constructClasses(props.chatMessage)}
onMouseLeave={(event) => setIsHovering(false)}
onMouseEnter={(event) => setIsHovering(true)}
onClick={props.chatMessage.by === "khoj" ? (event) => undefined : undefined}
>
<div className={chatMessageWrapperClasses(props.chatMessage)}>
<div
Expand Down Expand Up @@ -595,7 +605,7 @@ export default function ChatMessage(props: ChatMessageProps) {
title="Copy"
className={`${styles.copyButton}`}
onClick={() => {
navigator.clipboard.writeText(props.chatMessage.message);
navigator.clipboard.writeText(textRendered);
setCopySuccess(true);
}}
>
Expand Down
2 changes: 1 addition & 1 deletion src/interface/web/app/factchecker/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ async function verifyStatement(
let verificationMessage = `${verificationPrecursor} ${message}`;
const apiURL = `${chatURL}?q=${encodeURIComponent(verificationMessage)}&client=web&stream=true&conversation_id=${conversationId}`;
try {
const response = await fetch(apiURL);
const response = await fetch(apiURL, { method: "POST" });
if (!response.body) throw new Error("No response body found");

const reader = response.body?.getReader();
Expand Down
4 changes: 2 additions & 2 deletions src/interface/web/app/layout.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@ export default function RootLayout({
content="default-src 'self' https://assets.khoj.dev;
media-src * blob:;
script-src 'self' https://assets.khoj.dev 'unsafe-inline' 'unsafe-eval';
connect-src 'self' https://ipapi.co/json ws://localhost:42110;
connect-src 'self' blob: https://ipapi.co/json ws://localhost:42110;
style-src 'self' https://assets.khoj.dev 'unsafe-inline' https://fonts.googleapis.com;
img-src 'self' data: https://*.khoj.dev https://*.googleusercontent.com https://*.google.com/ https://*.gstatic.com;
img-src 'self' data: blob: https://*.khoj.dev https://*.googleusercontent.com https://*.google.com/ https://*.gstatic.com;
font-src 'self' https://assets.khoj.dev https://fonts.gstatic.com;
child-src 'none';
object-src 'none';"
Expand Down
Loading
Loading