Similarity score calculation bug in src/search.ts - L2 distance treated as cosine distance

The similarity score calculation in src/search.ts line 139 produces incorrect results (negative percentages, meaningless rankings) because it treats the Euclidean (L2) distance returned by sqlite-vec as if it were cosine distance.

**Current code (src/search.ts, line 139):**
```typescript
similarity: mode === 'text' ? undefined : 1 - row.distance,
```

**Problem:**
sqlite-vec returns L2 (Euclidean) distance, not cosine distance. For normalized embedding vectors, the relationship between L2 distance `d` and cosine similarity `s` is:

```
s = 1 - (d^2 / 2)
```

The current formula `1 - d` produces values that go negative for any distance > 1, which is common with L2 distances on 384-dimensional vectors. In practice, this means almost every search result shows a negative or near-zero similarity score.

**Fix:**
```typescript
similarity: mode === 'text' ? undefined : 1 - (row.distance * row.distance / 2),
```

**Before fix:** Scores like 0%, -5%, -11%, -16%
**After fix:** Scores like 50%, 45%, 39%, 32%

The ranking order of results was unaffected (L2 and cosine ordering are monotonically related for normalized vectors), but the scores were meaningless and confusing to users.

Tested on v1.0.15 with all-MiniLM-L6-v2 embeddings against ~235 indexed conversations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Similarity score calculation bug in src/search.ts - L2 distance treated as cosine distance #55

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Similarity score calculation bug in src/search.ts - L2 distance treated as cosine distance #55

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions