Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate late chunking for (potentially) missing context problem. #82

Open
muazhari opened this issue Oct 6, 2024 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@muazhari
Copy link

muazhari commented Oct 6, 2024

I still don't know if the Colpali model is only contextual to each or all documents. If only contextual to each document, then we could integrate late chunking to maximize the ColPali performance. Currently, this late chunking method is easier to implement, more efficient, and more robust to missing context than traditional chunking methods, as ColPali still keeps 1-page chunking. These might further remove ColPali from the need for pre-processing pipelines.

Details:

@ManuelFay ManuelFay added the enhancement New feature or request label Oct 9, 2024
@ManuelFay
Copy link
Collaborator

Very good idea, it's already in the roadmap along with exploring CDE ! Thanks for the suggestion !

@jvlinsta
Copy link

jvlinsta commented Oct 28, 2024

Would this mean that you would do "patching" on the embedding space, rather than pixels?
Is there currently some hyperparameter that restricts the chunking to a single page?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants