add ColQwen multimodal PDF retrieval integration #162

ASuresh0524 · 2025-11-11T00:10:44Z

Add ColQwenRAG class with easy-to-use CLI for multimodal PDF retrieval
Support for both ColQwen2 and ColPali models with automatic device selection
MPS optimization for Apple Silicon with memory-efficient loading
Complete pipeline: PDF→images→embeddings→HNSW index→search
Multi-vector indexing for fine-grained document matching
Comprehensive user guide and reproduction test script
Resolves [ColQwen doc and enhance support] #119: ColQwen Doc and Support Management

Features:

python -m apps.colqwen_rag build --pdfs ./pdfs/ --index my_index
python -m apps.colqwen_rag search my_index "query text"
python -m apps.colqwen_rag ask my_index --interactive
Automatic CPU fallback for memory constraints
Robust error handling and progress tracking

Checklist

Tests pass (uv run pytest)
Code formatted (ruff format and ruff check)
Pre-commit hooks pass (pre-commit run --all-files)

- Add ColQwenRAG class with easy-to-use CLI for multimodal PDF retrieval - Support for both ColQwen2 and ColPali models with automatic device selection - MPS optimization for Apple Silicon with memory-efficient loading - Complete pipeline: PDF→images→embeddings→HNSW index→search - Multi-vector indexing for fine-grained document matching - Comprehensive user guide and reproduction test script - Resolves #119: ColQwen Doc and Support Management Features: - python -m apps.colqwen_rag build --pdfs ./pdfs/ --index my_index - python -m apps.colqwen_rag search my_index "query text" - python -m apps.colqwen_rag ask my_index --interactive - Automatic CPU fallback for memory constraints - Robust error handling and progress tracking

- Add noqa comments for E402 errors (imports after sys.path modifications) - Remove unused variable assignment in colqwen_rag.py - Use importlib.util.find_spec for dependency checks instead of unused imports - Fix import ordering in test_colqwen_reproduction.py

yichuan-w · 2025-11-14T23:26:34Z

The faiss submodule still seems to have some problem, we need to remember to submodule update

yichuan-w · 2025-11-14T23:26:40Z

@ASuresh0524

ASuresh0524 · 2025-11-15T00:22:09Z

hmm @yichuan-w okay sounds good will look into it

- Add apps/image_rag.py for indexing and searching images using CLIP embeddings - Supports text-based image search queries - Uses CLIP ViT-L/14 model via sentence-transformers - Follows the same pattern as other RAG apps in the apps directory - Addresses feature request for CLIP support in apps (issue #94)

yichuan-w · 2025-11-17T22:30:57Z

apps/image_rag.py

+
+        # Load CLIP model
+        print("🔍 Loading CLIP model...")
+        model = SentenceTransformer(self.embedding_model_default)


are U sure we can run this? you should test yourself on some dataset, the default model should be clip stuff.

yichuan-w · 2025-12-03T09:13:04Z

@ASuresh0524 Thanks for the PR make sure the faiss submodule is correct, and I think we create an unnecessary faiss submodule update

ASuresh0524 · 2025-12-03T09:16:08Z

@ASuresh0524 Thanks for the PR make sure the faiss submodule is correct, and I think we create an unnecessary faiss submodule update

Sounds good, will fix this tomorrow

Reset faiss submodule to match main branch to avoid unnecessary changes

ASuresh0524 · 2025-12-04T02:50:49Z

@yichuan-w think i fixed it let me know if it looks good

yichuan-w · 2025-12-04T03:41:50Z

@ASuresh0524 can you add a brief introduction to Colqwen and how to use in the Readme?
Then feel free to merge yourself. Thanks!!
Remember to squash merge though

ASuresh0524 · 2025-12-04T04:28:17Z

sounds good! will work on this for the next day to make it in-depth and squash merge it as well

Add brief introduction and usage guide for ColQwen integration, similar to other RAG application sections in the README. - Quick start examples for building, searching, and interactive Q&A - Setup instructions with prerequisites - Model options (ColQwen2 vs ColPali) - Link to detailed ColQwen guide

yichuan-w · 2025-12-06T23:56:05Z

plz fix CI @ASuresh0524

ASuresh0524 added 2 commits November 10, 2025 13:31

yichuan-w mentioned this pull request Nov 13, 2025

[Feature]CLIP support #94

Open

yichuan-w reviewed Nov 17, 2025

View reviewed changes

Revert unnecessary faiss submodule update

86287d8

Reset faiss submodule to match main branch to avoid unnecessary changes

ASuresh0524 added 2 commits December 6, 2025 03:28

fix: Update ColQwen guide link to docs/ directory

af47dfd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add ColQwen multimodal PDF retrieval integration #162

add ColQwen multimodal PDF retrieval integration #162

ASuresh0524 commented Nov 11, 2025

Uh oh!

yichuan-w commented Nov 14, 2025

Uh oh!

yichuan-w commented Nov 14, 2025

Uh oh!

ASuresh0524 commented Nov 15, 2025

Uh oh!

yichuan-w Nov 17, 2025

Uh oh!

yichuan-w commented Dec 3, 2025

Uh oh!

ASuresh0524 commented Dec 3, 2025

Uh oh!

ASuresh0524 commented Dec 4, 2025

Uh oh!

yichuan-w commented Dec 4, 2025

Uh oh!

ASuresh0524 commented Dec 4, 2025

Uh oh!

yichuan-w commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add ColQwen multimodal PDF retrieval integration #162

Are you sure you want to change the base?

add ColQwen multimodal PDF retrieval integration #162

Conversation

ASuresh0524 commented Nov 11, 2025

Checklist

Uh oh!

yichuan-w commented Nov 14, 2025

Uh oh!

yichuan-w commented Nov 14, 2025

Uh oh!

ASuresh0524 commented Nov 15, 2025

Uh oh!

yichuan-w Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

yichuan-w commented Dec 3, 2025

Uh oh!

ASuresh0524 commented Dec 3, 2025

Uh oh!

ASuresh0524 commented Dec 4, 2025

Uh oh!

yichuan-w commented Dec 4, 2025

Uh oh!

ASuresh0524 commented Dec 4, 2025

Uh oh!

yichuan-w commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants