Skip to content

Jielin-Qiu/MM_Robustness

MM_Robustness

Journal of Data-centric Machine Learning Research (DMLR)

More details can be found on the project webpage.

The code for generating multimodal robustness evaluation datasets for downstream image-text applications, including image-text retrieval, visual reasoning, visual entailment, image captioning, and text-to-image generation.

Citation

If you feel our code or models help your research, kindly cite our papers:

@inproceedings{Qiu2022BenchmarkingRO,
  title={Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift},
  author={Jielin Qiu and Yi Zhu and Xingjian Shi and F. Wenzel and Zhiqiang Tang and Ding Zhao and Bo Li and Mu Li},
  journal={Journal of Data-centric Machine Learning Research (DMLR)},
  year={2024}
}

Installation

./install.sh

Datasets

Generate perturbation datasets

Evaluation data for text-to-image generation

For the text-to-image generation evaluation, we used the captions from COCO as prompt to generate the corresponding images. We also share the generated images here.

Baselines

For the evaluated baselines, plase see evaluated_baselines

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

About

[DMLR 2024] Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published