Skip to content

The repo for "On-the-fly Modulation for Balanced Multimodal Learning", T-PAMI 2024

Notifications You must be signed in to change notification settings

GeWu-Lab/BML_TPAMI2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code of On-the-fly Modulation for Balanced Multimodal Learning

The repo for "On-the-fly Modulation for Balanced Multimodal Learning", T-PAMI 2024

Here is the official PyTorch implementation of ''On-the-fly Modulation for Balanced Multimodal Learning'', which analyze and alleviate the imbalanced multimodal learning problem from both the feed-forward and the back-propagation stages during optimization Please refer to our T-PAMI 2024 paper for more details. This journal paper is extension of our previous CVPR 2022 paper [Balanced Multimodal Learning via On-the-fly Gradient Modulation].

Paper Title: "On-the-fly Modulation for Balanced Multimodal Learning"

Authors: Yake Wei, Di Hu, Henghui Du and Ji-Rong Wen

On-the-fly Modulation for Balanced Multimodal Learning

Multimodal learning is expected to boost model performance by integrating information from different modalities. However, its potential is not fully exploited because the widely-used joint training strategy, which has a uniform objective for all modalities, leads to imbalanced and under-optimized uni-modal representations. Specifically, we point out that there often exists modality with more discriminative information, e.g., vision of playing football and sound of blowing wind. They could dominate the joint training process, resulting in other modalities being significantly under-optimized.

To alleviate this problem, we first analyze the under-optimized phenomenon from both the feed-forward and the back-propagation stages during optimization. Then, On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies are proposed to modulate the optimization of each modality, by monitoring the discriminative discrepancy between modalities during training. Concretely, OPM weakens the influence of the dominant modality by dropping its feature with dynamical probability in the feed-forward stage, while OGM mitigates its gradient in the back-propagation stage. In experiments, our methods demonstrate considerable improvement across a variety of multimodal tasks. These simple yet effective strategies not only enhance performance in vanilla and task-oriented multimodal models, but also in more complex multimodal tasks, showcasing their effectiveness and flexibility.

Pipeline of OPM method.

Pipeline of OGM method.

Code instruction

Data Preparation

The original datasets can be found: CREMA-D, Kinetics-Sounds, UCF101.

The data preprocessing follows OGM-GE.

Install

pip install -r requirements.txt

Prepare dataset

  1. Get the prepeocessed data of the KS dataset to "YOU_PATH". In our case, video data are HDF5 files and audio data are PKL files.
  2. set HDF5_DIR = "YOU_PATH_VIDEO", PKL_DIR = "YOU_PATH_AUDIO" in dataset/KS.py

Training

cd code
# OGM
bash scripts/train_ogm.sh

# OPM
bash scripts/train_opm.sh

Inference

bash scripts/inference.sh

Citation

If you find this work useful, please consider citing it.


@article{wei2024on,
  title={On-the-fly modulation for balanced multimodal learning},
  author={Wei, Yake and Hu, Di and Du, Henghui and Wen, Ji-Rong},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024}
}

About

The repo for "On-the-fly Modulation for Balanced Multimodal Learning", T-PAMI 2024

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published