The Importance of Prompt Tuning for Automated Neuron Explanations

This is the official repository containing the code for our paper The Importance of Prompt Tuning for Automated Neuron Explanations (NeurIPS 2023 ATTRIB). arXiv, Project website. We build heavily on OpenAIs automated-interpretability and specifically analyze the importance of specific prompt types used to generate new explanations.

We find that simpler, more intuitive prompts such as our summary can increase both computational efficiency and quality of the generated explanations. In particular our simple Summary prompt shown below can outperform the original while requiring 2.4 times less input tokens per neuron.

Quickstart

Setup: first cd neuron-explainer and then install required packages by running pip install -e .
Change line 4 of neuron_explainer/utils.py to correspond to your OpenAI API key
Run Experiments/save_descriptions.ipynb to generate explanations for 5 sample neurons using our different prompts.
Compare results with neuron activations shown in NeuronViewer
To explain a different set of neurons, input a different csv to neurons_to_evaluate that similar to inputs/test_neurons.csv has the layer and neuron columns.

Reproducing Our Experiments

All neuron descriptions generated for the paper are available in Experiments\results.
Simulate and Score experiments can be reproduced by running Experiments\simulate_score.ipynb. Note: the simulator model used originally, 'gpt-3.5-turbo-instruct', is no longer supported by the API in the required format and has been replaced with 'text-davinci-003' in the code. This change will likely decrease simulation quality and increase api costs.
To calculate Similarity to baseline explanation(AdaCS) to evaluate explanation quality, run Experiments\ada_cs.ipynb.
To explan Neuron Puzzles and calculate AdaCS similarity to their ground truth explanations, run Experiments\puzzles.ipynb
Finally our comparison of number of API tokens per explained neuron can be reproduced in Experiments\simulate_score.ipynb.

Experiments\save_neuron_info.ipynb and Experiments\get_interpretable_neurons.ipynb are used to collect NeuronViewer explanations and select which neurons we should explain.

Overview of Neuron Explanation Pipeline

Cite us

If you find this code useful, please cite:

@misc{lee2023importance,
      title={The Importance of Prompt Tuning for Automated Neuron Explanations}, 
      author={Justin Lee and Tuomas Oikarinen and Arjun Chatha and Keng-Chi Chang and Yilan Chen and Tsui-Wei Weng},
      year={2023},
      eprint={2310.06200},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Experiments		Experiments
figs		figs
neuron-explainer		neuron-explainer
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Importance of Prompt Tuning for Automated Neuron Explanations

Quickstart

Reproducing Our Experiments

Overview of Neuron Explanation Pipeline

Cite us

About

Releases

Packages

Languages

Trustworthy-ML-Lab/Efficient-LLM-automated-interpretability

Folders and files

Latest commit

History

Repository files navigation

The Importance of Prompt Tuning for Automated Neuron Explanations

Quickstart

Reproducing Our Experiments

Overview of Neuron Explanation Pipeline

Cite us

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages