Is a Good Description Worth a Thousand Pictures? Reducing Multimodal Alignment to Text-Based, Unimodal Alignment

In this paper, we investigated whether the multimodal alignment problem could be effectively reduced to the unimodal alignment problem, wherein a language model would make a moral judgment purely based on a description of an image. Focusing on GPT-4 and LLaVA as two prominent examples of multimodal systems, we demonstrated, rather surprisingly, that this reduction can be achieved with a relatively small loss in moral judgment performance in the case of LLaVa, and virtually no loss in the case of GPT-4.

First Time Setup:

This setup is specifically for the Mila cluster. Revision will be needed for other infrastructures.

Prerequisites:

Ensure you have access to the cluster.
Ensure you have the necessary permissions to run interactive jobs.

Setup Steps:

Get an interactive job:
- Follow the cluster's documentation to start an interactive job.

Clone the repository:

cd ~
git clone https://github.com/amimem/alignment.git

Set permissions and run setup script:

cd alignment
chmod +x scripts/*.sh
source scripts/setup.sh

Future Use:

For future interactive jobs, simply run:

source scripts/int.sh

Running the sbatch Script:

To submit a batch job, run:

source scripts/slurm.sh

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
archive		archive
gpt_api		gpt_api
scripts		scripts
.gitignore		.gitignore
README.md		README.md
analysis.ipynb		analysis.ipynb
gpt_llama.py		gpt_llama.py
llava.py		llava.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Is a Good Description Worth a Thousand Pictures? Reducing Multimodal Alignment to Text-Based, Unimodal Alignment

First Time Setup:

Prerequisites:

Setup Steps:

Future Use:

Running the sbatch Script:

About

Releases

Packages

Contributors 2

Languages

amimem/alignment

Folders and files

Latest commit

History

Repository files navigation

Is a Good Description Worth a Thousand Pictures? Reducing Multimodal Alignment to Text-Based, Unimodal Alignment

First Time Setup:

Prerequisites:

Setup Steps:

Future Use:

Running the sbatch Script:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages