Skip to content

perf: optimize cosine similarity with SIMD #404

@bug-ops

Description

@bug-ops

Problem

Hand-rolled cosine similarity loop performs 44,800 floating-point ops per skill match query (50 skills × 896-dim embeddings) without SIMD optimization.

File: crates/zeph-skills/src/matcher.rs lines 145-167

Current code:

pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
    let mut dot = 0.0_f32;
    let mut norm_a = 0.0_f32;
    let mut norm_b = 0.0_f32;
    for (x, y) in a.iter().zip(b.iter()) {
        dot += x * y;
        norm_a += x * x;
        norm_b += y * y;
    }
    dot / (norm_a.sqrt() * norm_b.sqrt())
}

Impact

  • CPU: Measurable impact with >20 skills
  • LLVM auto-vectorization not guaranteed

Solution

Use SIMD-optimized library:

[dependencies]
simsimd = "6"  # or ndarray = "0.16"

Expected speedup: 5-10× on large vectors

Priority: P1
Effort: Medium (3-4 hours, testing required)
Related to #391

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance optimization

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions