v0.0.1 (2024-12-13)
Fix
Unknown
-
add ci (#3)
-
add ci, pre-commit
-
fix ci
-
update ci
-
update ruff config
Co-authored-by: Daniel Tan <dtch1997@users.noreply.github.com> (33c8a2b
)
-
Update README.md (
7ed9e3b
) -
add arxiv badge (
f302b40
) -
update (
15f3930
) -
rename persona_generalization to steering_generalization (
0ddd1ad
) -
Update README.md (
a17bfba
) -
Update README.md (
b566646
) -
Update README.md (
95295e6
) -
Create README.md (
fae2efb
) -
Create README.md (
9151b2a
) -
Update README.md (
0f25019
) -
Refusal (#2)
-
add refusal dataset
-
wip refusal experiment
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (893d67a
)
-
remove redundant script (
3d4870f
) -
refactor (
ffe4666
) -
defer formatting to apply_chat_template (
53021b6
) -
add persona generalization experiment (
3d70293
) -
delete results (
8dac290
) -
steering experiments (#1)
-
wip steering experiment
-
minimal working version done
-
refactor
-
add working layer sweep script
-
working layer sweep
Co-authored-by: Daniel Tan <dtch1997@users.noreply.github.com> (b571672
)