Skip to content

YangHao97/speech_specific_risk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights

The repo is for EMNLP 2024 paper: Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights

Taxonomy

Our speech-specific risk taxonomy includes 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity).

Statistics

Due to the safeguards and limitation of existing TTS system, we generate synthetic speech for four risk sub-categories: malicious sarcasm, age, gender, and ethnicity stereotypical biases.

Prompting Strategies

We adopt Yes/No question and Multi-choice question as prompts, detailed in Table 9 of our paper.

Evaluation

We evaluate the capability of five advanced speech LMMs in detecting speech-specific risks, including Qwen-audio-chat, SALMONN-7B/13B, WavLLM, and Gemini-1.5-Pro. Please deploy models/APIs based on the corresponding offical instructions.

We provide an example evaluation in Qwen-Audio-Chat-Sarcasm.

Dataset

The data access will be granted via submitting a form indicating the researchers’ affiliation and the intention of use. Access the dataset

Citation

@article{yang2024towards,
  title={Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights},
  author={Yang, Hao and Qu, Lizhen and Shareghi, Ehsan and Haffari, Gholamreza},
  journal={arXiv preprint arXiv:2406.17430},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages