- [2024.01.03]: 🔥 We release the code related to MLLMs steering.
- [2025.01.02]: 📜 Our paper Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering is on arxiv.
- [2024.10.30]: 🔥 XL-VLMs repo is public.
- [2024.09.25]: 🎉 Our paper A Concept based Explainability Framework for Large Multimodal Models
is accepted at NeurIPS 2024.
With this repo you can reproduce the results introduced in these papers:
![]() |
![]() |
Overview
> Multimodal LLMs have reached remarkable levels of proficiency in understanding multimodal inputs. However, much less attention has been paid to understanding and explaining the underlying mechanisms of these models. Most existing explainability research examines these models only in their final states, overlooking the dynamic representational shifts that occur during training.In this work, we systematically analyze the evolution of hidden state representations to reveal how fine-tuning alters the internal structure of a model to specialize in new multimodal tasks. We also demonstrate the use of shift vectors to capture these changes.
Finally, we explore the practical impact of our findings on model steering, showing that we can adjust multimodal LLMs behaviors without any training, such as modifying answer types, captions style, or biasing the model toward specific responses.
Overview
> Large multimodal models (LMMs) combine unimodal encoders and large language models (LLMs) to perform multimodal tasks. Despite recent advancements towards the interpretability of these models, understanding internal representations of LMMs remains largely a mystery.In this paper, we present a novel framework for the interpretation of LMMs. We propose a dictionary learning based approach, applied to the representation of tokens. The elements of the learned dictionary correspond to our proposed concepts. We show that these concepts are well semantically grounded in both vision and text. Thus we refer to these as "multi-modal concepts".
We qualitatively and quantitatively evaluate the results of the learnt concepts. We show that the extracted multimodal concepts are useful to interpret representations of test samples. Finally, we evaluate the disentanglement between different concepts and the quality of grounding concepts visually and textually.
Please refer to docs/installation.md for installation instructions
We support models from the transformers
library. Currently we support the following:
- llava-v1.5-7b
- idefics2-8b
- Molmo-7B-D-0924
- Qwen2-VL-7B-Instruct
A high-level workflow while working with the repo could consist of three different parts :
- 🚀 Extracting hidden states from the multimodal LLM.
- 🧩 Aggregating extracted hidden states across target samples; let's call this aggregation
Z
. - 🔍 Decomposing
Z
into concept vectors and activations, using a decomposition strategy such as semi-NMF, k-means, etc.:Z = U V
. - 🖼️ Grounding the concepts (columns of
U
) in text and image.
👉 Check out src/examples/concept_dictionary for commands related to this part (described in our previous work CoX-LMM: A Concept-based Explainability Framework for Large Multimodal Models).
- 📊 Computing concepts from the original and destination models.
- 🧠 Associating each sample with the concept it activates the most.
- ✨ Computing the shift in the representation of samples associated with each concept and obtaining a shift vector.
- 🔧 Applying the shift on the concepts of the original model, and comparing the result with concepts of the destination model.
👉 Check out src/examples/shift_analysis/concept_dictionary_evaluation.sh for commands related to this part (and visualization of this analysis can be found in playground/shift_analysis.ipynb).
🧪 You can test this feature by providing your own hidden state representations, which should be structured in a file as described in docs/saved_feature_structure.md.
- ⚙️ Computing steering vectors from the hidden representations of two sets of samples; one set is associated with the source, and the other with the target of steering (e.g., a particular answer in VQA, or captions styles).
- 🎯 Applying this steering vector on validation samples, and evaluating the steering.
👉 Check out src/examples/steering for commands related to steering the model for different tasks.
🧪 You can visualize the results using the notebook playground/steering_analysis.ipynb
We welcome contributions to this repo. It could be in form of support for other models, datasets, or other analysis/interpretation methods for multimodal models. However, contributions should only be made via pull requests. Please refer to rules given at docs/contributing.md
If you find this repo useful, you can cite our works as follows:
@article{khayatan2025analyzing,
title={Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment},
author={Khayatan, Pegah and Shukor, Mustafa and Parekh, Jayneel and Cord, Matthieu},
journal={arXiv preprint arXiv:2501.03012},
year={2025}
}
@article{parekh2024concept,
title={A Concept-Based Explainability Framework for Large Multimodal Models},
author={Parekh, Jayneel and Khayatan, Pegah and Shukor, Mustafa and Newson, Alasdair and Cord, Matthieu},
journal={arXiv preprint arXiv:2406.08074},
year={2024}
}