-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
performancePerformance optimizationPerformance optimization
Description
Problem
Hand-rolled cosine similarity loop performs 44,800 floating-point ops per skill match query (50 skills × 896-dim embeddings) without SIMD optimization.
File: crates/zeph-skills/src/matcher.rs lines 145-167
Current code:
pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
let mut dot = 0.0_f32;
let mut norm_a = 0.0_f32;
let mut norm_b = 0.0_f32;
for (x, y) in a.iter().zip(b.iter()) {
dot += x * y;
norm_a += x * x;
norm_b += y * y;
}
dot / (norm_a.sqrt() * norm_b.sqrt())
}Impact
- CPU: Measurable impact with >20 skills
- LLVM auto-vectorization not guaranteed
Solution
Use SIMD-optimized library:
[dependencies]
simsimd = "6" # or ndarray = "0.16"Expected speedup: 5-10× on large vectors
Priority: P1
Effort: Medium (3-4 hours, testing required)
Related to #391
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
performancePerformance optimizationPerformance optimization