Skip to content

Conversation

@yichuan-w
Copy link
Owner

… retrieval

  • Add timing measurements for search operations (load and core time)
  • Increase embedding batch size from 1 to 32 for better performance
  • Add explicit memory cleanup with del all_embeddings
  • Support loading and merging multiple datasets with different splits
  • Add CLI arguments for search method selection (ann/exact/exact-all)
  • Auto-detect image field names across different dataset structures
  • Print candidate doc counts for performance monitoring

🤖 Generated with Claude Code

What does this PR do?

Related Issues

Fixes #

Checklist

  • Tests pass (uv run pytest)
  • Code formatted (ruff format and ruff check)
  • Pre-commit hooks pass (pre-commit run --all-files)

yichuan-w and others added 3 commits November 10, 2025 21:13
… retrieval

- Add timing measurements for search operations (load and core time)
- Increase embedding batch size from 1 to 32 for better performance
- Add explicit memory cleanup with del all_embeddings
- Support loading and merging multiple datasets with different splits
- Add CLI arguments for search method selection (ann/exact/exact-all)
- Auto-detect image field names across different dataset structures
- Print candidate doc counts for performance monitoring

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants