Skip to content

Conversation

@andylizf
Copy link
Collaborator

Supersedes #165.

  1. We evaluate recompute parameter from Searcher.search to Searcher.__init__ in this PR.
  2. Also, we refactored the embedding server ZMQ logic to make it cleaner.
  3. An experimental manual_tokenize can be used to fasten the embedding generation, and thus fasten the search process.

CalebZ9909 and others added 10 commits November 12, 2025 08:03
- Reproduced the slow search performance issue (15-30s vs expected ~2s)
- Identified root cause: default complexity=64 is too high for fast search
- Created test script demonstrating performance with different complexity values
- Test results show complexity=16-32 achieves ~2s search time (matching paper)
- Added comprehensive analysis document with solutions and recommendations

Key findings:
- Default complexity=64 results in ~36s search time
- Reducing complexity to 16-32 achieves ~2s search time
- beam_width parameter is mainly for DiskANN, not HNSW
- Paper likely used smaller embedding model (~100M) and lower complexity

Solutions provided:
1. Reduce complexity parameter to 16-32 for faster search
2. Consider DiskANN backend for better performance on large datasets
3. Use smaller embedding model if speed is critical
- Test script to reproduce slow search performance issue
- Generates ~90K chunks (~180MB) similar to user's dataset
- Tests search performance with different complexity values (8, 16, 32, 64)
- Demonstrates that complexity=16-32 achieves ~2s search time
- Validates the performance analysis findings
@andylizf andylizf requested a review from yichuan-w November 24, 2025 08:05
@andylizf
Copy link
Collaborator Author

@Ai-yang-dev Can you take a look here?

@yichuan-w
Copy link
Owner

@andylizf also add the logic to keep the embedding server alive, and add command to kill that

@Ai-yang-dev
Copy link

@Ai-yang-dev Can you take a look here?

Fine. Thanks for Sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants