Integrate late chunking for (potentially) missing context problem. #82

muazhari · 2024-10-06T20:31:41Z

I still don't know if the Colpali model is only contextual to each or all documents. If only contextual to each document, then we could integrate late chunking to maximize the ColPali performance. Currently, this late chunking method is easier to implement, more efficient, and more robust to missing context than traditional chunking methods, as ColPali still keeps 1-page chunking. These might further remove ColPali from the need for pre-processing pipelines.

Details:

ManuelFay · 2024-10-09T13:22:16Z

Very good idea, it's already in the roadmap along with exploring CDE ! Thanks for the suggestion !

jvlinsta · 2024-10-28T10:40:02Z

Would this mean that you would do "patching" on the embedding space, rather than pixels?
Is there currently some hyperparameter that restricts the chunking to a single page?

ManuelFay added the enhancement New feature or request label Oct 9, 2024

This was referenced Jan 8, 2025

[Question] Can Colpali be run at the document-level instead of the page-level? #161

Closed

Is it possible to search at the Document-level (instead of Page-level)? tjmlabs/ColiVara#125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate late chunking for (potentially) missing context problem. #82

Integrate late chunking for (potentially) missing context problem. #82

muazhari commented Oct 6, 2024 •

edited

Loading

ManuelFay commented Oct 9, 2024

jvlinsta commented Oct 28, 2024 •

edited

Loading

Integrate late chunking for (potentially) missing context problem. #82

Integrate late chunking for (potentially) missing context problem. #82

Comments

muazhari commented Oct 6, 2024 • edited Loading

ManuelFay commented Oct 9, 2024

jvlinsta commented Oct 28, 2024 • edited Loading

muazhari commented Oct 6, 2024 •

edited

Loading

jvlinsta commented Oct 28, 2024 •

edited

Loading