Large Language Models Often Say One Thing and Do Another

An implementation for Large Language Models Often Say One Thing and Do Another.
Please contact @Ruoxi Xu (ruoxi2021@iscas.ac.cn) for questions and suggestions.

Requirements

General

Python (verified on 3.10)
CUDA (verified on 12.1)

Python Packages

see requirements.txt

conda create -n wdct python=3.10
conda activate wdct

pip install -r requirements.txt

Quick Start

Words and Deeds Consistency Test

The WDCT dataset is located in data/WDCT.csv. Each row represents a test case with the following columns:

Type: Domain category of the test case. Possible values: Opinion, NonEthV (non ethical value), EthV (ethical value), Theory.
id: Unique identifier for the test case.
speak: A word question that probes a model's opinions, values, or related attributes through direct queries.
act: A deed question that evaluates a model's hypothetical actions in grounded real-world scenarios.
correct_answer: The ground-truth answer for domains with definitive answers.

Model Evaluation

Before proceeding with the evaluation, please ensure that the model path is correctly specified in the config.py file.

To evaluate a single model, execute the following command:

python eval/test_local_models.py \
    --model llama-2-7b \
    --test_setting normal \
    --batch_size 32 \
    --temperature 0 \
    --max_new_tokens 10 \
    --run_time 1

--model refers to the model name to be evaluated.
--test_setting refers to the evaluation setting. You can choose between 'normal' and '3_shot_cot'.
--run_time refers to the number of evaluation attempts. If multiple evaluations are required, this parameter can be adjusted.

Alternatively, if you need to evaluate multiple models, you can run the eval.sh script:

./eval.sh

Model Alignment

This implementation aligns LLM's words or deeds on WDCT using the DPO (Direct Preference Optimization) algorithm.

The DPO process consists of two stages:

Supervised Fine-Tuning (SFT): Perform supervised fine-tuning on the dataset(s) of interest. This ensures that the preference data is in-distribution for the policy, providing a good foundation for the learning from preferences phase.
Preference Learning: Use preference data to train the model from step 1.

To align a single model, run the following command:

# Infer
python eval/test_local_models.py \
    --model ${model_name} \
    --dataset ${dataset} \
    --batch_size 32 \
    --run_time 1

# SFT
python train.py \
    model=${model_name} \
    model.name_or_path=${name_or_path} \
    datasets="[${dataset}]" \
    change_mode=${change} \
    loss=${train_mode} \
    exp_name=${exp_name} \
    n_epochs=4 \
    gradient_accumulation_steps=1 \
    batch_size=2 \
    lr=${sft_lr} \
    warmup_steps=50 \
    eval_batch_size=32 \
    eval_every=6240 \
    minimum_log_interval_secs=1 \
    trainer=BasicTrainer \
    sample_during_eval=False 2>&1 | tee "$log_file"
   
# DPO
python train.py \
    model=${model_name} \
    model.name_or_path=${name_or_path} \
    model.archive=${sft_model_path} \
    datasets="[${dataset}]" \
    change_mode=${change} \
    loss=${train_mode} \
    loss.beta=0.1 \
    exp_name=${exp_name} \
    n_epochs=4 \
    gradient_accumulation_steps=1 \
    batch_size=8 \
    lr=${dpo_lr} \
    warmup_steps=50 \
    eval_batch_size=32 \
    eval_every=624 \
    minimum_log_interval_secs=1 \
    trainer=BasicTrainer \
    sample_during_eval=False 2>&1 | tee "$log_file"

Alternatively, if you need to evaluate multiple models, you can run the align.sh script:

./align.sh

License

The code is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License for Noncommercial use only. Any commercial use should get formal permission first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Language Models Often Say One Thing and Do Another

Requirements

Quick Start

Words and Deeds Consistency Test

Model Evaluation

Model Alignment

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
dpo		dpo
eval		eval
utils		utils
.gitignore		.gitignore
README.md		README.md
align.sh		align.sh
eval.sh		eval.sh
requirements.txt		requirements.txt
train.py		train.py

icip-cas/Word-Deed-Consistency-Test

Folders and files

Latest commit

History

Repository files navigation

Large Language Models Often Say One Thing and Do Another

Requirements

Quick Start

Words and Deeds Consistency Test

Model Evaluation

Model Alignment

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages