Code for ALBEF: a new vision-language pre-training method
-
Updated
Sep 20, 2022 - Python
Code for ALBEF: a new vision-language pre-training method
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
Quality-Aware Image-Text Alignment for Opinion-Unaware Image Quality Assessment
A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
A server powering LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
An Interactive Game-based Vision Planning benchmark
This project is a FastAPI-based web application designed to analyze C a m b r i d g e I E L T S P D F s ( B o o k s 1 − 18 ) for the most and least repeated words. It can handle both regular text-based PDFs and scanned image-based PDFs by converting them to images and extracting text using OCR (Optical Character Recognition).
caption generator using lavis and argostranslate
The first public Vietnamese visual linguistic foundation model(s)
lmmtoolkit is a toolkit for Multi-Modal Learning
Some Python scripts to load Vietnamese visual linguistic data
Text-Image-Text is a bidirectional system that enables seamless retrieval of images based on text descriptions, and vice versa. It leverages state-of-the-art language and vision models to bridge the gap between textual and visual representations.
The offical code for paper "Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking", ACM Multimedia 2019 Oral
Scan text from an image and convert into speech/audio of desired language.
Add a description, image, and links to the image-text topic page so that developers can more easily learn about it.
To associate your repository with the image-text topic, visit your repo's landing page and select "manage topics."