This repository contains the code and data for our paper:

Whose Opinions Do Language Models Reflect?
Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, Tatsunori Hashimoto
Paper: http://arxiv.org/abs/2303.17548

    @Article{santurkar2023whose,
        title={Whose Opinions Do Language Models Reflect?},
        author={Shibani Santurkar and Esin Durmus and Faisal Ladhak and Cinoo Lee and Percy Liang and Tatsunori Hashimoto},
        year={2023},
        journal={arXiv preprint arXiv:2303.17548},
    }

Getting started

You can start by cloning our repository and following the steps below.

Download and the OpinionQA dataset in ./data. Included as part of the dataset are: (i) model_input: 1498 multiple-choice questions based on Pew American Trends Panel surveys that can be used to probe LMs, (ii) human_resp: individualized human responses for these questions from Pew, and (iii) runs: pre-computed responses for OpenAI and AI21 Labs models studied in our paper.
Compute human and LM opinion distributions using this notebook.
You can explore human-LM alignment along various axes using the following notebooks: representativeness, steerability, consistency and refusals.
(Optional) If you would like to query models yourself, you will need to set up the crfm-helm Python package.

Then, to obtain model responses, run:

helm-run -c src/helm/benchmark/presentation/run_specs_opinions_qa_openai_default.conf --max-eval-instances 500 --suite $SUITE
helm-run -c src/helm/benchmark/presentation/run_specs_opinions_qa_ai21_default.conf --max-eval-instances 500 --suite $SUITE
helm-run -c src/helm/benchmark/presentation/run_specs_opinions_qa_openai_steer.conf --max-eval-instances 50000 --suite $SUITE
helm-run -c src/helm/benchmark/presentation/run_specs_opinions_qa_ai21_steer.conf --max-eval-instances 50000 --suite $SUITE

Maintainers

Shibani Santurkar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Getting started

Maintainers

Files

README.md

Latest commit

History

README.md

File metadata and controls

Getting started

Maintainers