cmathw

Follow

cmathw

Follow

6 followers · 23 following

Achievements

Achievements

cmathw/README.md

Hi there 👋

I'm currently conducting research in mechanistic interpretability.

OSS Contributions:

Core Contributor: Open-source replication of Anthropic's Sleeper Agents paper, repo here.

Research:

Past papers and posts that I have contributed on include:

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition (repo here, completed as a part of MATS 4.0)
Structured World Representations in Maze-Solving Transformers (Accepted to NeurIPS UniReps '23 Workshop)
Polysemantic Attention Head in a 4-Layer Transformer
A Configurable Library for Generating and Manipulating Maze Datasets

Pinned Loading

Cadenza-Labs/sleeper-agents Cadenza-Labs/sleeper-agents Public

Python 10 1
understanding-search/maze-transformer understanding-search/maze-transformer Public

This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.

Jupyter Notebook 24 6
gated-attn gated-attn Public

Python 1