This is the GitHub repository for Data Augmentation via Latent Diffusion for Saliency Prediction paper in ECCV 2024, Milano, Italy.
📝 TL;DR We use latent diffusion to augment data for saliency prediction. We show our method improves the performance of existing saliency models. Moreover, we learn multilevel features that improve saliency prediction.
- 📚 Paper
- 📦 Supplementary Material
- 🖼️ Poster
- 🎞️ Video
- 💻 Virtual Poster Session
[🤗 Demo is coming soon!]
Saliency prediction models are constrained by the limited diversity and quantity of labeled data. Standard data augmentation techniques such as rotating and cropping alter scene composition, affecting saliency. We propose a novel data augmentation method for deep saliency prediction that edits natural images while preserving the complexity and variability of real-world scenes. Since saliency depends on high-level and low-level features, our approach involves learning both by incorporating photometric and semantic attributes such as color, contrast, brightness, and class. To that end, we introduce a saliency-guided cross-attention mechanism that enables targeted edits on the photometric properties, thereby enhancing saliency within specific image regions. Experimental results show that our data augmentation method consistently improves the performance of various saliency models. Moreover, leveraging the augmentation features for saliency prediction yields superior performance on publicly available saliency benchmarks. Our predictions align closely with human visual attention patterns in the edited images, as validated by a user study.
Explore our visualizations and learn more about our project at https://augsal.github.io/.
If you use this work in your research, please cite our paper as follows:
@InProceedings{aydemir2024augsal,
title = {Data Augmentation via Latent Diffusion forSaliency Prediction},
author = {Aydemir, Bahar and Bhattacharjee, Deblina and Zhang, Tong and Salzmann, Mathieu and S{"u}sstrunk, Sabine},
booktitle = {18th European Conference on Computer Vision (ECCV), Proceedings},
year = {2024},
}
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.