- 2025-07-23 – Accepted to ICCV Workshops 2025 (CV4DC, Honolulu, Oct 19–23). Camera-ready coming soon.
Road safety assessments are costly and data-hungry, especially in LMICs. V-RoAst is a zero-shot Visual Question Answering framework that uses general-purpose VLMs (e.g., Gemini-1.5-Flash, GPT-4o-mini) to classify 52 iRAP attributes from street-level imagery.
We provide:
- ThaiRAP dataset: 2,037 images (519 segments) with expert-coded iRAP labels.
- Prompt templates & code for attribute classification with VLMs.
- CNN baselines (VGG/ResNet) for comparison.
- Open VLM benchmark for iRAP-style road safety attributes.
- Prompt engineering framework (system/user prompts + local context).
- Zero-shot evaluation, incl. unseen classes.
- Automatic star-rating demo using crowdsourced Mapillary imagery.
Road safety assessments are critical yet costly, especially in Low- and Middle-Income Countries (LMICs), where most roads remain unrated. Traditional methods require expert annotation and training data, while supervised learning-based approaches struggle to generalise across regions. In this paper, we introduce V-RoAst, a zero-shot Visual Question Answering (VQA) framework using VisionLanguage Models (VLMs) to classify road safety attributes defined by the iRAP standard. We introduce the first opensource dataset from ThaiRAP, consisting of over 2,000 curated street-level images from Thailand annotated for this task. We evaluate Gemini-1.5-flash and GPT-4o-mini on this dataset and benchmark their performance against VGGNet and ResNet baselines. While VLMs underperform on spatial awareness, they generalise well to unseen classes and offer flexible prompt-based reasoning without retraining. Our results show that VLMs can serve as automatic road assessment tools when integrated with complementary data. This work is the first to explore VLMs for zero-shot infrastructure risk assessment and opens new directions for automatic, low-cost road safety mapping. Code and dataset:https://github.com/PongNJ/V-RoAst.
-
OpenAI: We used OpenAI version 1.40.3. Find the documentation here.
-
Google Gemini: We used google-generativeai version 0.7.2. Find the documentation here.
-
Mapillary API: Access the documentation here.
git clone https://github.com/PongNJ/V-RoAst.git
Please download ThaiRAP dataset from (google drive) or (ucl rdr) and upload all images to the ./image/ThaiRAP/ directory.
The ThaiRAP dataset combines street images with road attributes, stored in a CSV file, as shown in the structure below:
├─V-RoAst
│ ├─image
│ │ ├─ThaiRAP
│ │ │ ├─1.jpg
│ │ │ ├─2.jpg
│ │ │ ├─...
│ │ │ └─2037.jpg
│ └─Validation.csv
│
If you use this dataset or refer to our work, please cite:
@misc{jongwiriyanurak2025vroastvisualroadassessment,
title={V-RoAst: Visual Road Assessment. Can VLM be a Road Safety Assessor Using the iRAP Standard?},
author={Natchapon Jongwiriyanurak and Zichao Zeng and June Moh Goo and Xinglei Wang and Ilya Ilyankou and Kerkritt Sriroongvikrai and Nicola Christie and Meihui Wang and Huanfa Chen and James Haworth},
year={2025},
eprint={2408.10872},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.10872},
}