Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multimodal( FTS, vector, hybrid) search capability #463

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion electron/main/vector-database/ipcHandlers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import { StoreSchema } from '../electron-store/storeConfig'
import { startWatchingDirectory, updateFileListForRenderer } from '../filesystem/filesystem'

import { rerankSearchedEmbeddings } from './embeddings'
import { DBEntry, DatabaseFields } from './schema'
import { DBEntry, DatabaseFields, DBQueryResult } from './schema'
import { RepopulateTableWithMissingItems } from './tableHelperFunctions'

export interface PromptWithRagResults {
Expand All @@ -37,6 +37,24 @@ export const registerDBSessionHandlers = (store: Store<StoreSchema>, _windowMana
return searchResults
})

ipcMain.handle(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The multi-modal search handler is implemented correctly, but consider adding error handling for invalid searchType values.

'multi-modal-search',
async (
event,
query: string,
limit: number,
searchType: 'vector' | 'text' | 'hybrid',
filter?: string,
): Promise<{ vectorResults: DBQueryResult[]; textResults: DBQueryResult[] }> => {
const windowInfo = windowManager.getWindowInfoForContents(event.sender)
if (!windowInfo) {
throw new Error('Window info not found.')
}
const searchResults = await windowInfo.dbTableClient.multiModalSearch(query, limit, searchType, filter)
return searchResults
},
)

ipcMain.handle('index-files-in-directory', async (event) => {
const windowInfo = windowManager.getWindowInfoForContents(event.sender)
if (!windowInfo) {
Expand Down
34 changes: 34 additions & 0 deletions electron/main/vector-database/lanceTableWrapper.ts
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,40 @@ class LanceDBTableWrapper {
const mapped = rawResults.map(convertRecordToDBType<DBEntry>)
return mapped as DBEntry[]
}

async multiModalSearch(
query: string,
limit: number,
searchType: 'vector' | 'text' | 'hybrid' = 'vector',
filter?: string,
): Promise<{ vectorResults: DBQueryResult[]; textResults: DBQueryResult[] }> {
let vectorResults: DBQueryResult[] = []
let textResults: DBQueryResult[] = []

if (searchType === 'vector' || searchType === 'hybrid') {
const vectorQuery = await this.lanceTable.search(query).metricType(MetricType.Cosine).limit(limit)
if (filter) {
vectorQuery.prefilter(true).filter(filter)
}
const rawVectorResults = await vectorQuery.execute()
vectorResults = rawVectorResults
.map(convertRecordToDBType<DBQueryResult>)
.filter((r): r is DBQueryResult => r !== null)
Comment on lines +137 to +139
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Filtering out null results here may lead to fewer results than the specified limit. Consider adjusting the limit to account for potential null results

}

if (searchType === 'text' || searchType === 'hybrid') {
const sanitizedTextQuery = sanitizePathForDatabase(query)
const textFilter = filter
? `${filter} AND content LIKE '%${sanitizedTextQuery}%'`
: `content LIKE '%${sanitizedTextQuery}%'`
Comment on lines +145 to +146
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: LIKE with wildcards on both sides can be slow for large datasets. Consider using full-text search capabilities if available in LanceDB

const rawTextResults = await this.lanceTable.filter(textFilter).limit(limit).execute()
textResults = rawTextResults
.map(convertRecordToDBType<DBQueryResult>)
.filter((r): r is DBQueryResult => r !== null)
}

return { vectorResults, textResults }
}
}

export default LanceDBTableWrapper
9 changes: 9 additions & 0 deletions electron/preload/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,15 @@ function createIPCHandler<T extends (...args: any[]) => any>(channel: string): I

const database = {
search: createIPCHandler<(query: string, limit: number, filter?: string) => Promise<DBQueryResult[]>>('search'),
multiModalSearch:
createIPCHandler<
(
query: string,
limit: number,
searchType: 'vector' | 'text' | 'hybrid',
filter?: string,
) => Promise<{ vectorResults: DBQueryResult[]; textResults: DBQueryResult[] }>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider adding a type alias for the return type of multiModalSearch for better readability and maintainability.

>('multi-modal-search'),
deleteLanceDBEntriesByFilePath: createIPCHandler<(filePath: string) => Promise<void>>(
'delete-lance-db-entries-by-filepath',
),
Expand Down
11 changes: 8 additions & 3 deletions src/components/File/DBResultPreview.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,15 @@ export const DBSearchPreview: React.FC<DBSearchPreviewProps> = ({ dbResult: entr
<MarkdownRenderer content={entry.content} />
</div>
<div className="mt-2 text-xs text-gray-400">
{fileName && <span className="text-xs text-gray-400">{fileName} </span>} | Similarity:{' '}
{fileName && <span className="text-xs text-gray-400">{fileName} </span>}
{/* eslint-disable-next-line no-underscore-dangle */}
{cosineDistanceToPercentage(entry._distance)}% |{' '}
{modified && <span className="text-xs text-gray-400">Modified {modified}</span>}
{entry._distance != null && (
<>
{/* eslint-disable-next-line no-underscore-dangle */}| Similarity:{' '}
{cosineDistanceToPercentage(entry._distance)}%{' '}
</>
)}{' '}
Comment on lines +67 to +72
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider moving this logic to a separate function for better readability and reusability

| {modified && <span className="text-xs text-gray-400">Modified {modified}</span>}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Add a space before the pipe character for consistent formatting

</div>
</div>
)
Expand Down
2 changes: 1 addition & 1 deletion src/components/MainPage.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ const MainPageContent: React.FC = () => {
/>
</div>

<ResizableComponent resizeSide="right">
<ResizableComponent resizeSide="right" initialWidth={300}>
<div className="size-full border-y-0 border-l-0 border-r-[0.001px] border-solid border-neutral-700">
<SidebarManager />
</div>
Expand Down
5 changes: 4 additions & 1 deletion src/components/Sidebars/MainSidebar.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,10 @@ const SidebarManager: React.FC = () => {
const { sidebarShowing } = useChatContext()

const [searchQuery, setSearchQuery] = useState<string>('')
const [searchResults, setSearchResults] = useState<DBQueryResult[]>([])
const [searchResults, setSearchResults] = useState<{ vectorResults: DBQueryResult[]; textResults: DBQueryResult[] }>({
vectorResults: [],
textResults: [],
})
Comment on lines +17 to +20
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider using a more specific type for searchResults instead of an inline object type


return (
<div className="size-full overflow-y-hidden">
Expand Down
40 changes: 31 additions & 9 deletions src/components/Sidebars/SearchComponent.tsx
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
import React, { useEffect, useRef, useCallback } from 'react'
import React, { useEffect, useRef, useCallback, useState } from 'react'
import { DBQueryResult } from 'electron/main/vector-database/schema'
import posthog from 'posthog-js'
import { FaSearch } from 'react-icons/fa'
import { debounce } from 'lodash'
import { DBSearchPreview } from '../File/DBResultPreview'
import { useContentContext } from '@/contexts/ContentContext'

type SearchType = 'vector' | 'text' | 'hybrid'

interface SearchComponentProps {
searchQuery: string
setSearchQuery: (query: string) => void
searchResults: DBQueryResult[]
setSearchResults: (results: DBQueryResult[]) => void
searchResults: { vectorResults: DBQueryResult[]; textResults: DBQueryResult[] }
setSearchResults: (results: { vectorResults: DBQueryResult[]; textResults: DBQueryResult[] }) => void
}

const SearchComponent: React.FC<SearchComponentProps> = ({
Expand All @@ -21,13 +23,14 @@ const SearchComponent: React.FC<SearchComponentProps> = ({
}) => {
const { openContent: openTabContent } = useContentContext()
const searchInputRef = useRef<HTMLInputElement>(null)
const [searchType, setSearchType] = useState<SearchType>('vector')

const handleSearch = useCallback(
async (query: string) => {
const results: DBQueryResult[] = await window.database.search(query, 50)
const results = await window.database.multiModalSearch(query, 50, searchType)
setSearchResults(results)
},
[setSearchResults],
[setSearchResults, searchType],
)

const debouncedSearch = useCallback(
Expand All @@ -46,7 +49,7 @@ const SearchComponent: React.FC<SearchComponentProps> = ({
if (searchQuery) {
debouncedSearch(searchQuery)
}
}, [searchQuery, debouncedSearch])
}, [searchQuery, debouncedSearch, searchType])

const openFileSelectSearch = useCallback(
(path: string) => {
Expand All @@ -70,13 +73,32 @@ const SearchComponent: React.FC<SearchComponentProps> = ({
onChange={(e) => setSearchQuery(e.target.value)}
placeholder="Semantic search..."
/>
<select
value={searchType}
onChange={(e) => setSearchType(e.target.value as SearchType)}
className="absolute right-3 top-1/2 -translate-y-1/2 rounded bg-neutral-700 text-white"
>
<option value="vector">Vector</option>
<option value="text">Text</option>
<option value="hybrid">Hybrid</option>
</select>
</div>
<div className="mt-2 w-full">
{searchResults.length > 0 && (
{searchResults?.textResults?.length > 0 && (
<div className="mt-4 w-full">
<h3 className="mb-2 text-white">Text Search Results</h3>
{searchResults.textResults.map((result, index) => (
// eslint-disable-next-line react/no-array-index-key
<DBSearchPreview key={`text-${index}`} dbResult={result} onSelect={openFileSelectSearch} />
))}
subin-chella marked this conversation as resolved.
Show resolved Hide resolved
</div>
)}
{searchResults?.vectorResults?.length > 0 && (
Comment on lines +87 to +95
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Add error handling for cases where searchResults is undefined

<div className="w-full">
{searchResults.map((result, index) => (
<h3 className="mb-2 text-white">Vector Search Results</h3>
{searchResults.vectorResults.map((result, index) => (
// eslint-disable-next-line react/no-array-index-key
<DBSearchPreview key={index} dbResult={result} onSelect={openFileSelectSearch} />
<DBSearchPreview key={`vector-${index}`} dbResult={result} onSelect={openFileSelectSearch} />
))}
</div>
)}
Expand Down
Loading