This repo contains code for the following papers
- Emergent Representations of Program Semantics in Language Models Trained on Programs (ICML'24, arXiv)
- Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data (COLM'24, arXiv)
Creating a conda env
conda create --prefix=./env --file environment.yml
conda activate ./env
Then generate the Karel dataset (see
) -
Training an LM
./scripts/ base karel
Training probes for one checkpoint
./scripts/ karel 76000
Training the LM
./scripts/ karel_noloops_nocond "--output_dir filtered --learning_rate 5e-6 --num_warmup_steps 6000 --max_train_steps 80000 --lengths_to_filter 1 2 3 4 5"
Training probes for one checkpoint
./scripts/ karel_noloops_nocond 76000 "--eval_mode intervention --output_dir filtered --max_eval_samples 50000"
We reuse the checkpoints from ICML'24
Training probes for one checkpoint
./scripts/ karel 76000 "--eval_mode causal --output_dir filtered --eval_dataset karel_15only_uniform_noloops_nocond_nomarks --max_eval_samples 50000"