Skip to content

Similarity score calculation bug in src/search.ts - L2 distance treated as cosine distance #55

@gmax111

Description

@gmax111

The similarity score calculation in src/search.ts line 139 produces incorrect results (negative percentages, meaningless rankings) because it treats the Euclidean (L2) distance returned by sqlite-vec as if it were cosine distance.

Current code (src/search.ts, line 139):

similarity: mode === 'text' ? undefined : 1 - row.distance,

Problem:
sqlite-vec returns L2 (Euclidean) distance, not cosine distance. For normalized embedding vectors, the relationship between L2 distance d and cosine similarity s is:

s = 1 - (d^2 / 2)

The current formula 1 - d produces values that go negative for any distance > 1, which is common with L2 distances on 384-dimensional vectors. In practice, this means almost every search result shows a negative or near-zero similarity score.

Fix:

similarity: mode === 'text' ? undefined : 1 - (row.distance * row.distance / 2),

Before fix: Scores like 0%, -5%, -11%, -16%
After fix: Scores like 50%, 45%, 39%, 32%

The ranking order of results was unaffected (L2 and cosine ordering are monotonically related for normalized vectors), but the scores were meaningless and confusing to users.

Tested on v1.0.15 with all-MiniLM-L6-v2 embeddings against ~235 indexed conversations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions