📄arXiv • 🌐Demo • Youtube • 𝕏 Blog
This work aims to build the circuits in the pretrained language models that are responsible for the specific knowledge and analyze the behavior of these components. We construct a demo to see the discovered circuit.
- A new method EAP-IG is integrated in the eap folder. This method takes less time than the ACDC method and you can use it in the
knowledge_eap.ipynb
. If you are using the LLaMA2-7B-Chat model, running this file on a single GPU will require approximately 57,116M of GPU memory and 3-4 minutes.
The filtered data for each kind of model is at here. Please download it and put it in the data folder.
Build the environement:
conda create -n knowledgecircuit python=3.10
pip install -r requirements.txt
❗️The code may fail under torch 2.x.x. We recommend torch 1.x.x
Just run the following commond:
cd acdc
sh run.sh
Here is an example to run the circuit for the country_capital_city
in GPT2-Medium
.
MODEL_PATH=/path/to/the/model
KT=factual
KNOWLEDGE=country_capital_city
NUM_EXAMPLES=20
MODEL_NAME=gpt2-medium
python main.py --task=knowledge \
--zero-ablation \
--threshold=0.01 \
--device=cuda:0 \
--metric=match_nll \
--indices-mode=reverse \
--first-cache-cpu=False \
--second-cache-cpu=False \
--max-num-epochs=10000 \
--specific-knowledge=$KNOWLEDGE \
--num-examples=$NUM_EXAMPLES \
--relation-reverse=False \
--knowledge-type=$KT \
--model-name=$MODEL_NAME \
--model-path=$MODEL_PATH
You would get the results in acdc/factual_results/gpt2-medium
and the final_graph.pdf
is the computed circuits.
Run the component.ipynb in notebook.
We thank for the project of transformer_lens, ACDC and LRE. The code in this work is built on top of these three projects' codes.
Please cite our repository if you use Knowledge Circuit in your work. Thanks!
@article{DBLP:journals/corr/abs-2405-17969,
author = {Yunzhi Yao and
Ningyu Zhang and
Zekun Xi and
Mengru Wang and
Ziwen Xu and
Shumin Deng and
Huajun Chen},
title = {Knowledge Circuits in Pretrained Transformers},
journal = {CoRR},
volume = {abs/2405.17969},
year = {2024},
url = {https://doi.org/10.48550/arXiv.2405.17969},
doi = {10.48550/ARXIV.2405.17969},
eprinttype = {arXiv},
eprint = {2405.17969},
timestamp = {Fri, 21 Jun 2024 22:39:09 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2405-17969.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}