Launch embedding server earlier #176

andylizf · 2025-11-24T08:05:15Z

Supersedes #165.

We evaluate recompute parameter from Searcher.search to Searcher.__init__ in this PR.
Also, we refactored the embedding server ZMQ logic to make it cleaner.
An experimental manual_tokenize can be used to fasten the embedding generation, and thus fasten the search process.

- Reproduced the slow search performance issue (15-30s vs expected ~2s) - Identified root cause: default complexity=64 is too high for fast search - Created test script demonstrating performance with different complexity values - Test results show complexity=16-32 achieves ~2s search time (matching paper) - Added comprehensive analysis document with solutions and recommendations Key findings: - Default complexity=64 results in ~36s search time - Reducing complexity to 16-32 achieves ~2s search time - beam_width parameter is mainly for DiskANN, not HNSW - Paper likely used smaller embedding model (~100M) and lower complexity Solutions provided: 1. Reduce complexity parameter to 16-32 for faster search 2. Consider DiskANN backend for better performance on large datasets 3. Use smaller embedding model if speed is critical

- Test script to reproduce slow search performance issue - Generates ~90K chunks (~180MB) similar to user's dataset - Tests search performance with different complexity values (8, 16, 32, 64) - Demonstrates that complexity=16-32 achieves ~2s search time - Validates the performance analysis findings

andylizf · 2025-11-24T08:07:11Z

@Ai-yang-dev Can you take a look here?

yichuan-w · 2025-11-25T03:45:32Z

@andylizf also add the logic to keep the embedding server alive, and add command to kill that

Ai-yang-dev · 2025-11-25T03:49:18Z

@Ai-yang-dev Can you take a look here?

Fine. Thanks for Sharing.

CalebZ9909 and others added 10 commits November 12, 2025 08:03

add test

469dce0

refactor: embedding server

29ef3c9

refactor: embedding server

66c6aad

fix: faster embed

36c44b8

fix: recompute args in searcher

2536800

fix

cd1d853

fix

9ac9eab

chore

8d202b8

andylizf requested a review from yichuan-w November 24, 2025 08:05

push

ed15776

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Launch embedding server earlier #176

Launch embedding server earlier #176

Uh oh!

andylizf commented Nov 24, 2025

Uh oh!

andylizf commented Nov 24, 2025

Uh oh!

yichuan-w commented Nov 25, 2025

Uh oh!

Ai-yang-dev commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Launch embedding server earlier #176

Are you sure you want to change the base?

Launch embedding server earlier #176

Uh oh!

Conversation

andylizf commented Nov 24, 2025

Uh oh!

andylizf commented Nov 24, 2025

Uh oh!

yichuan-w commented Nov 25, 2025

Uh oh!

Ai-yang-dev commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants