cooperleong00

Follow

CooperLeong cooperleong00

Follow

53 followers · 290 following

cooperleong00.github.io

Achievements

Achievements

Organizations

cooperleong00/README.md

Pinned Loading

Awesome-LLM-Interpretability Awesome-LLM-Interpretability Public

A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..

176 8
ToxificationReversal ToxificationReversal Public

Code for the paper "Self-Detoxifying Language Models via Toxification Reversal" (EMNLP 2023)

Python 15 3
MikaStars39/FeatureAlignment MikaStars39/FeatureAlignment Public

FeatureAlignment = Alignment + Mechanistic Interpretability

Python 15 1