Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vespa 🤝 ColPali: Efficient Document Retrieval with Vision Language Models — pyvespa documentation #892

Open
1 task
ShellLM opened this issue Aug 16, 2024 · 1 comment
Labels
embeddings vector embeddings and related tools Jupyter-Notebook Jupyter Interactive Notebooks and related content multimodal-llm LLMs that combine modes such as text and image recognition. New-Label Choose this option if the existing labels are insufficient to describe the content accurately RAG Retrieval Augmented Generation for LLMs

Comments

@ShellLM
Copy link
Collaborator

ShellLM commented Aug 16, 2024

Vespa 🤝 ColPali: Efficient Document Retrieval with Vision Language Models — pyvespa documentation

Snippet

"Vespa 🤝 ColPali: Efficient Document Retrieval with Vision Language Models

This notebook demonstrates how to represent ColPali in Vespa. ColPali is a powerful visual language model that can generate embeddings for images and text. In this notebook, we will use ColPali to generate embeddings for images of PDF pages and store them in Vespa. We will also store the base64 encoded image of the PDF page and some meta data like title and url. We will then demonstrate how to retrieve the pdf pages using the embeddings generated by ColPali.

ColPali: Efficient Document Retrieval with Vision Language Models Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo

ColPail is a combination of ColBERT and PailGemma:

ColPali is enabled by the latest advances in Vision Language Models, notably the PaliGemma model from the Google Zürich team, and leverages multi-vector retrieval through late interaction mechanisms as proposed in ColBERT by Omar Khattab.

Quote from ColPali: Efficient Document Retrieval with Vision Language Models 👀

The ColPali model achieves remarkable retrieval performance on the ViDoRe (Visual Document Retrieval) Benchmark. Beating complex pipelines with a single model.

The TLDR of this notebook:

Generate an image per PDF page using pdf2image and also extract the text using pypdf.
For each page image, use ColPali to obtain the visual multi-vector embeddings
Then we store colbert embeddings in Vespa and use the long-context variant where we represent the colbert embeddings per document with the tensor tensor(page{}, patch{}, v[128]). This enables us to use the PDF as the document (retrievable unit), storing the page embeddings in the same document.

We also store the base64 encoded image, and page meta data like title and url so that we can display it in the result page, but also use it for RAG with powerful LLMs with vision capabilities.

At query time, we retrieve using BM25 over all the text from all pages, then use the ColPali embeddings to rerank the results using the max page score."

Content

Title

Vespa 🤝 ColPali: Efficient Document Retrieval with Vision Language Models — pyvespa documentation

URL

https://pyvespa.readthedocs.io/en/latest/examples/colpali-document-retrieval-vision-language-models.html

Snippet

"Vespa 🤝 ColPali: Efficient Document Retrieval with Vision Language Models

This notebook demonstrates how to represent ColPali in Vespa. ColPali is a powerful visual language model that can generate embeddings for images and text. In this notebook, we will use ColPali to generate embeddings for images of PDF pages and store them in Vespa. We will also store the base64 encoded image of the PDF page and some meta data like title and url. We will then demonstrate how to retrieve the pdf pages using the embeddings generated by ColPali.

ColPali: Efficient Document Retrieval with Vision Language Models Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo

ColPail is a combination of ColBERT and PailGemma:

ColPali is enabled by the latest advances in Vision Language Models, notably the PaliGemma model from the Google Zürich team, and leverages multi-vector retrieval through late interaction mechanisms as proposed in ColBERT by Omar Khattab.

Quote from ColPali: Efficient Document Retrieval with Vision Language Models 👀

The ColPali model achieves remarkable retrieval performance on the ViDoRe (Visual Document Retrieval) Benchmark. Beating complex pipelines with a single model.

The TLDR of this notebook:

Generate an image per PDF page using pdf2image and also extract the text using pypdf.
For each page image, use ColPali to obtain the visual multi-vector embeddings
Then we store colbert embeddings in Vespa and use the long-context variant where we represent the colbert embeddings per document with the tensor tensor(page{}, patch{}, v[128]). This enables us to use the PDF as the document (retrievable unit), storing the page embeddings in the same document.

We also store the base64 encoded image, and page meta data like title and url so that we can display it in the result page, but also use it for RAG with powerful LLMs with vision capabilities.

At query time, we retrieve using BM25 over all the text from all pages, then use the ColPali embeddings to rerank the results using the max page score."

Suggested labels

{'label-name': 'Vision-Language-Models', 'label-description': 'Models that combine visual and textual information for processing and retrieval tasks.', 'gh-repo': 'pyvespa', 'confidence': 85.19}

@ShellLM ShellLM added embeddings vector embeddings and related tools Jupyter-Notebook Jupyter Interactive Notebooks and related content multimodal-llm LLMs that combine modes such as text and image recognition. New-Label Choose this option if the existing labels are insufficient to describe the content accurately RAG Retrieval Augmented Generation for LLMs labels Aug 16, 2024
@ShellLM
Copy link
Collaborator Author

ShellLM commented Aug 16, 2024

Related content

#891 similarity score: 0.91
#678 similarity score: 0.85
#865 similarity score: 0.85
#838 similarity score: 0.85
#134 similarity score: 0.85
#762 similarity score: 0.85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
embeddings vector embeddings and related tools Jupyter-Notebook Jupyter Interactive Notebooks and related content multimodal-llm LLMs that combine modes such as text and image recognition. New-Label Choose this option if the existing labels are insufficient to describe the content accurately RAG Retrieval Augmented Generation for LLMs
Projects
None yet
Development

No branches or pull requests

1 participant