Multimodal Situational Safety

Kaiwen Zhou*¹, Chengzhi Liu*¹, Xuandong Zhao², Anderson Compalas¹, Dawn Song², Xin Eric Wang†¹

¹University of California, Santa Cruz, ²University of California, Berkley

^*Equal contribution

Multimodal Large Language Models (MLLMs) are rapidly evolving, demonstrating impressive capabilities as multimodal assistants that interact with both humans and their environments. However, this increased sophistication introduces significant safety concerns. In this paper, we present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety, which explores how safety considerations vary based on the specific situation in which the user or agent is engaged. We argue that for an MLLM to respond safely—whether through language or action—it often needs to assess the safety implications of a language query within its corresponding visual context. To evaluate this capability, we develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs. The dataset comprises 1,820 language query-image pairs, half of which the image context is safe, and the other half is unsafe. We also develop an evaluation framework that analyzes key safety aspects, including explicit safety reasoning, visual understanding, and, crucially, situational safety reasoning. Our findings reveal that current MLLMs struggle with this nuanced safety problem in the instruction-following setting and struggle to tackle these situational safety challenges all at once, highlighting a key area for future research. Furthermore, we develop multi-agent pipelines to coordinately solve safety challenges, which shows consistent improvement in safety over the original MLLM response.

Dataset Structure

The Dataset can be downloaded from Hugging Face.

Each entry in the Chat Task dataset contains the following fields:

safe_image_path: the file path to the safe image.
intent: The user's intent in the context of images.
unsafe_image: The description of unsafe image.
unsafe_image_path: the file path to the unsafe image.
Type: The multimodal situational safety category of the entry.
queries: The user's question in Chat Task.

Each entry in the Embodied Task dataset contains the following fields:

task: the specific embodied task.
category: The multimodal situational safety category of the entry.
safe_instruction/safe_instructions: The user's safe instructions and related variations.
unsafe_instruction/unsafe_instructions: The user's unsafe instructions and related variations.
safe: the file path to the safe image.
unsafe: the file path to the unsafe image.

Evaluation

You can do evaluation by running our evaluation code inference.py.

The deployment of the model can refer to models.

We provide a Multi-Agent framework for enhancing situational safety — refer to infer_on_multiagent.py for details.

Citation

@misc{zhou2024multimodalsituationalsafety,
      title={Multimodal Situational Safety}, 
      author={Kaiwen Zhou and Chengzhi Liu and Xuandong Zhao and Anderson Compalas and Dawn Song and Xin Eric Wang},
      year={2024},
      eprint={2410.06172},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2410.06172}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
figures		figures
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Situational Safety

Dataset Structure

Evaluation

Citation

About

Releases

Packages

Contributors 2

Languages

License

eric-ai-lab/MSSBench

Folders and files

Latest commit

History

Repository files navigation

Multimodal Situational Safety

Dataset Structure

Evaluation

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages