| Paper |
Model | Vision Backbone | Text Backbone |
---|---|---|
ViLReF_ViT | ViT-b/16 | RoBERTa-wwm-ext-base-chinese |
ViLReF_RN50 | ResNet50 | RoBERTa-wwm-ext-base-chinese |
asposestorage==1.0.2
lmdb==1.3.0
numpy==1.24.4
onnx==1.16.1
onnxmltools==1.12.0
onnxruntime==1.18.1
pandas==1.3.2
Pillow==10.4.0
scikit_learn==1.3.2
six==1.16.0
tensorrt==10.2.0.post1
timm==0.9.2
torch==1.13.1
torchvision==0.14.1
tqdm==4.64.0
bash train.sh
bash load_model.sh
@misc{yang2024vilrefchinesevisionlanguageretinal,
title={ViLReF: An Expert Knowledge Enabled Vision-Language Retinal Foundation Model},
author={Shengzhu Yang and Jiawei Du and Jia Guo and Weihang Zhang and Hanruo Liu and Huiqi Li and Ningli Wang},
year={2024},
eprint={2408.10894},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.10894},
}